\chapter{Foundations} \label{ch:Foundations} This chapter gives an overview of the foundations this thesis is based upon. It examines recommender systems and basic approaches, describes group recommenders, introduces product configuration, shows what group based product configuration looks like and introduces the prototype that is extended by this thesis. \section{Recommender System} \label{sec:Foundations:RecommenderSystem} A recommender system is a system that gives individualized recommendations to users to guide them through a large space of objects \cite[~ p. 331]{burkeHybridRecommenderSystems2002}. There are several approaches to recommender systems presented in \cite{felfernigGroupRecommenderSystems2018}, some of them are: collaborative filtering, constraint-based recommendation and content-based filtering. There is also hybrid recommendation which combines approaches but this is not the focus in this thesis. \subsection{Collaborative Filtering} \begin{table}[tb] \centering \begin{tabular}{ l | c | c | c | c | c } & The Matrix & Titanic & Die Hard & Forest Gump & Wall-E \\ \hline John & 5 & 1 & & 2 & 2 \\ Lucy & 1 & 5 & 2 & 5 & 5 \\ Eric & 2 & ? & 3 & 5 & 4 \\ Diane & 4 & 3 & 5 & 3 & \\ \end{tabular} \caption[Movies: Rating Matrix]{An example showing users ratings for movies by \citeauthor{ningComprehensiveSurveyNeighborhoodBased2015} \cite{ningComprehensiveSurveyNeighborhoodBased2015}.} \label{tab:Foundations:RecommenderSystem:MoviePreferences} \end{table} In collaborative filtering a user's rating for unknown items is predicted by finding similar users who have rated it. Their rating is used as prediction \cite[~ pp. 7, 8]{felfernigDecisionTasksBasic2018}. Collaborative Filtering can not only be done using users, it can also be item-based. Hereby the similarity between items is used for a recommendation and not similar users \cite{ricciRecommenderSystemsHandbook2015}. \autoref{tab:Foundations:RecommenderSystem:MoviePreferences} shows an example rating matrix. A simple user-based way to calculate a rating would be to use a k-nearest neighbour (kNN) algorithm and then take the average of those ratings. Using this method with $k := 2$ and euclidean distance, Eric's closest neighbours are \textit{Lucy} and \textit{Diane} therefore giving a predicted rating of $4$. An item-based approach will try to find similar items based on the user's rating. Here, an example of similar items would be \textit{Forest Gump} and \textit{Wall-E} as John and Lucy each have given them the same rating and Eric's rating is off by one. Using again kNN with $k := 2$ it is found that \textit{Forest Gump} and \textit{Wall-E} are the most similar to \textit{Titanic} thereby having a predicted rating of $4.5$. However this simple similarity and prediction function does not take into account different distances. For example Lucy's ratings are more similar compared to Eric's than Diane's but Diane's and Lucy's ratings are valued the same. \subsection{Constraint-Based Recommendation} Hereby filter rules are defined which filter out items that do not fulfil specified rules. A user models their requirements with these rules and thereby gets a list of recommended items. This approach requires deep knowledge about a product because it requires a detailed description of features \cite[~ p. 12]{felfernigDecisionTasksBasic2018}. The previous movie example (see \autoref{tab:Foundations:RecommenderSystem:MoviePreferences}) requires additional information for example about plot structure, pacing, length and other attributes of the movie. Now the user can enter filtering criteria, e.g. that the movie should be no longer than 120 minutes, be categorized as action or thriller and have a fast pacing. The system will only recommend movies that fit into these constraints. \subsection{Content-Based Filtering} For the content-based filtering approach, items and users are assigned to categories. Based on consumption and rating of items a user will have implicit ratings for categories. Predictions are now made based on the categories of a new item \cite[~ pp. 10, 11]{felfernigDecisionTasksBasic2018}. Using the example from \autoref{tab:Foundations:RecommenderSystem:MoviePreferences} and using an additional category matrix (see \autoref{tab:Foundations:RecommenderSystem:ContentBasedFilteringCategories}), a rating matrix per category can be derived (using the average rating of the user of each movie contained in this category). The result can be seen in \autoref{tab:Foundations:RecommenderSystem:ContentBasedFilteringProfiles}. To predict Eric's rating of Titanic, the categories of \textit{Titanic} and averages of Eric's implicit rating per category are used. Titanic is only in the category romance and as Eric's rating of \textit{Forest Gump} is $5$ the prediction is a rating of $5$. Categories do not have to be the genre, they could be any kind of data relating to a movie. \begin{table} \centering \begin{tabular}{ l | c | c | c | c | c } & The Matrix & Titanic & Die Hard & Forest Gump & Wall-E \\ \hline Action & x & & x & & \\ Sci-Fi & x & & & & \\ Thriller & & & x & & \\ Romance & & x & & x & \\ Family & & & & x & x \\ \end{tabular} \caption[Movies: Category Matrix]{Showing example categories for movies in \autoref{tab:Foundations:RecommenderSystem:MoviePreferences}.} \label{tab:Foundations:RecommenderSystem:ContentBasedFilteringCategories} \end{table} \begin{table} \centering \begin{tabular}{ l | c | c | c | c | c } & Action & Sci-Fi & Thriller & Romance & Family \\ \hline John & 5 & 5 & & 1.5 & 2 \\ Lucy & 1.5 & 1 & 2 & 5 & 5 \\ Eric & 2.5 & 2 & 3 & 5 & 4.5 \\ Diane & 4.5 & 4 & 5 & 3 & 3 \\ \end{tabular} \caption[Movies: User Profile Matrix for Genres]{User profiles generated from categories and rating from \autoref{tab:Foundations:RecommenderSystem:MoviePreferences} and \autoref{tab:Foundations:RecommenderSystem:ContentBasedFilteringCategories}.} \label{tab:Foundations:RecommenderSystem:ContentBasedFilteringProfiles} \end{table} Advantages and disadvantages of basic recommendation techniques are listed in \autoref{tab:Foundations:RecommenderComparison}. The following subsections show advantages of content-based filtering over collaborative filtering and over constrained-based recommendation. \begin{table} \begin{center} \begin{tabularx}{\columnwidth}{X|X|X} & advantages & disadvantages \\ \hline Collaborative Filtering & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item Serendipity of results \item Automatic learning of market segments \item No domain knowledge required \end{itemize} & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item Cold start problem for users and items \item Grey sheep problem \item Quality based on rating quality \item Data sparsity \item Privacy not guaranteed \end{itemize} \\ \hline Content-Based Filtering & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item No community required \item User-independent \item Transparent \item No item cold start \item Simplicity \item Robust \item Stable to constant influx of new users \end{itemize} & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item Overspecialisation \item No serendipity \item User cold start problem \item Requires domain knowledge \end{itemize} \\ \hline Constraint-Based Recommendation & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item Transparent \item Good for non-discrete values \end{itemize} & \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt, leftmargin=3.5mm] \item Inconsistent constraints \item No results \end{itemize} \\ \end{tabularx} \caption[Comparison of Recommendation Approaches]{A description of the advantages and disadvantages of common recommendation techniques \cite{richthammerSituationAwarenessRecommender2018, shokeenStudyFeaturesSocial2019,hahslerRecommenderlabFrameworkDeveloping2015, aminiDiscoveringImpactKnowledge2011, suSurveyCollaborativeFiltering2009}} \label{tab:Foundations:RecommenderComparison} \end{center} \end{table} \subsubsection{Advantages over Collaborative Filtering} Collaborative filtering has several issues that content-based filtering doesn't have. According to \citeauthor{likaFacingColdStart2014} \cite{likaFacingColdStart2014} the \emph{cold start problem} is one of the well-known problems of recommender systems. It occurs when there is sparse information for users or items. In the case of collaborative filtering this issue concerns for both items and users. Content-based filtering does not have that issue with items as items are classified based on similarity to other items. The user cold start problem however still persists when a new user has not yet rated any items. Accordingly, no similar items can be recommended. Another common issue is the \emph{grey sheep problem} \cite{grasIdentifyingGreySheep2016}. Collaborative filtering approaches assume that users that are similar, have similar preferences. A user that is not similar to any of the current user or community of users fail that assumption. Therefore, good recommendations cannot be made. These users are called \emph{grey sheep users}. Item-based filtering does not have this issue as a user's preference is directly used to find similar items to the ones they like, not similar users. Usually, the need for domain knowledge is a disadvantage. However, as product configuration already has domain knowledge baked in to describe features and how they can be combined, this is not a disadvantage and can even be seen as an advantage. Hence, domain knowledge can directly be used and does not first need to be learned indirectly. Additionally, a collaborative filtering approach spans a larger comparison space, based on preferences, compared to content-based filtering that only uses the item attributes. Thus, for applications with a large solution space, reliance on product features instead of user similarity should be considered. Last, content-based filtering does not depend on historic group preference accuracy. Thus, malicious actors that try to manipulate the recommendation system do not decrease recommendation accuracy. The same is true for inaccurate preferences. For example, this occurs if a user's input into a system does not accurately reflect what they actually like. \subsubsection{Advantages over Constrained-Based Recommendation} In constrained-based recommendation approaches it is possible that constraints lead to no possible solution \cite[~ p. 44]{felfernigAlgorithmsGroupRecommendation2018}. This then requires further techniques of constrained relaxing and a user is faced with the situation that they have to search for constraints which fulfil less strict requirements. Moreover, in groups a constraint-based approach has to deal with contrary user constraints. Therefore, diverse groups could have issues with it. \section{Group Recommender System} \label{sec:Foundations:GroupRecommenderSystem} A group recommender system is a recommender system aimed at making recommendations for a group instead of a single user. To make recommendations preferences of all group members have to be aggregated. This can be done by either aggregating single user recommendations or by merging preferences of each user into a group preference model \cite{jamesonRecommendationGroups2007}. Based on the resulting preference model, recommendation strategies can be used to generate recommendations. The strategy of aggregating predictions can be further divided into two strategies. \citeauthor{felfernigAlgorithmsGroupRecommendation2018} \cite{felfernigAlgorithmsGroupRecommendation2018} describes merging recommendations and "ranking of candidate items". Merging recommendations can be used when multiple possible solutions are to be presented. The recommender picks $n$ recommendation from each user's individual recommendations and merges them into a list. The second approach is that each user's individual recommender ranks all items. The group member's specific rankings then are aggregated to get a group ranking of items. Instead of ranking it is also possible to simply predict a user's rating for an item. The aggregation of preferences uses a merging strategy to combine the individual preferences into group preferences. This allows a group to change its preferences during the course of the decision without changing individual preferences. Both the approach of merging preferences and the approach of using individual user's rankings require some kind of aggregation strategy. This section presents three strategies: multiplication, average and least misery. The multiplication strategy multiplies preferences of users and thereby combines them into a group preference. Similarly the average strategy takes the average of a rating and the least misery strategy takes the lowest rating among group members. To illustrate the example in \autoref{tab:Foundations:RecommenderSystem:MoviePreferences} is used. Lucy, Eric and Diane form a group. The resulting ratings for each strategy are shown in \autoref{tab:Foundations:RecommenderSystem:AggregationStrategy}. \begin{table} \centering \begin{tabular}{ l | c | c | c | c | c } & The Matrix & Titanic & Die Hard & Forest Gump & Wall-E \\ \hline multiplication & 8 & - & 30 & 75 & - \\ average & $\frac{7}{3}$ & - & $\frac{10}{3}$ & $\frac{13}{3}$ & - \\ least misery & 1 & - & 2 & 3 & - \\ \end{tabular} \caption[Movies: Result Matrix for Group Aggregation Strategies]{An example showing preference aggregation strategies for a group using the data from \autoref{tab:Foundations:RecommenderSystem:MoviePreferences}. Titanic and Wall-E were left out because not all group members have rated these movies.} \label{tab:Foundations:RecommenderSystem:AggregationStrategy} \end{table} \section{Product Configuration} \label{sec:Foundations:ProductConfiguration} Product configuration is a process consisting of a series of decision tasks whereby a product is constructed of components which interact with each other. During a configuration process no new components are created. Their interplay and specification is defined beforehand \cite[~ pp. 42, 43]{sabinProductConfigurationFrameworksa1998}. This section defines product configuration analogous to \citeauthor{felfernigOpenConfiguration2014} \cite{felfernigOpenConfiguration2014}. Formally a configuration problem can be specified as a \emph{constraint satisfaction problem (CSP)} \cite{tsangFoundationsConstraintSatisfaction1993} as \begin{equation} \label{eq:Foundations:ProductConfiguration:ConstraintSatisfactionProblem} CSP(V,\mathfrak{D},C), \end{equation} where $V$ is a set of \emph{variables} (which in this thesis will also be referred to as \emph{features}) with \begin{equation} \label{eq:Foundations:ProductConfiguration:Variables} V = \{v_1, \dotsc, v_m\}, \end{equation} a \emph{domain mapping} $\mathfrak{D}$ that maps variable to their corresponding domain values $D$ (which in this thesis will be also referred to as \emph{characteristics}) i.e. the values that can be assigned to each variable \begin{equation}\label{eq:Foundations:ProductConfiguration:DomainMapping} \mathfrak{D} : V \to D; x \mapsto \mathfrak{D}(x) \qquad where \ D = \{d_1, \dotsc, d_o\}, \end{equation} and \emph{constraints} $C$ that limit the solution space with \begin{equation}\label{eq:Foundations:ProductConfiguration:Constraints} C = \{c_1, \cdots, c_k\}. \end{equation} In group-based configuration (also known as collaborative or group configuration) a group instead of a single user is set to configure a product. This entails challenges in terms of synchronising workspaces and keeping the data consistent for every group member. \citeauthor{raabKollaborativeProduktkonfigurationEchtzeit2019}'s \cite{raabKollaborativeProduktkonfigurationEchtzeit2019} approach, which this thesis extends, is to treat the group configuration the same as one shared configuration and to sync the selection of attributes across clients. \subsection{Group-Based Product Configuration} \label{sec:Foundations:GroupBasedProductConfiguration} Group-based product configuration is an approach to product configuration where product configuration is done by multiple users. According to \citeauthor{felferningGroupBasedConfiguration2016} \cite{felferningGroupBasedConfiguration2016} considering classical approaches as single user tasks can lead to decisions that are suboptimal. Moreover, there are scenarios where a decision is inherently a group task such as \emph{Holiday Planning}, \emph{Software Release Planning} and \emph{Product Line Scoping}. Group-based configuration takes the representation of a group directly into account. \section{Base System} \label{sec:Foundations:BaseSystem} This thesis extends a base software system, \emph{CAS Configurator Merlin}, from \emph{CAS Software AG} \cite{CASSoftwareAG}. \citeauthor{raabKollaborativeProduktkonfigurationEchtzeit2019} \cite{raabKollaborativeProduktkonfigurationEchtzeit2019} extends CAS Merlin Configurator in his thesis to allow simultaneous configuration. Here groups of people are able to simultaneously configure a product. If there are any conflicts, a conflict resolution process is started. However the prototype only allows a majority voting approach and does not provide any further group decision support. Also this process only starts upon the selection of an invalid state that gives alternatives and for the main configuration process no recommendation or conflict resolution exists. The extended architecture is shown in \autoref{fig:Foundations:CollaborativeConfiguratorMerlin}. He only makes changes to M.Customer which is renamed to M.Collab-Customer and introduces a new component M.Collab. \begin{figure} \centering \includegraphics[width=0.6\textwidth]{./figures/20_fountations/MerlinCollaborativeConfigurator.pdf} \caption[Architecture: Collaborative Configurator]{Architecture of Collaborative Configurator Merlin \cite[Fig. 4.3]{raabKollaborativeProduktkonfigurationEchtzeit2019}} \label{fig:Foundations:CollaborativeConfiguratorMerlin} \end{figure} The following list provides a short overview of each component. \begin{description} \item[M.Core] provides the base of the configurator. It checks the configuration against all rules in the database, provides possible alternatives if a change invalidates other parts of a configuration. The system relies on a CSP solver for validation and suggestion of alternatives. \item[M.Model] is the editor to create products and rules. These rules can then be uploaded to M.Core. \item[M.Collab] is a node.js server application that communicates with M.Core via REST-API and with M.Collab-Customer via WebSocket. It sits in between M.Collab-Customer and M.Core and handles all processing regarding collaborative configuration. \item[M.Collab-Customer] is a modified version of M.Customer that does all communication via WebSocket and does communicate with M.Collab instead of M.Core. M.Customer is the customer facing component. It allows a customer to configure a product or solution. \end{description} \section{Software Design} \label{sec:Foundations:BaseSystem} In this thesis for the design of the prototype to improve software quality some rules of software engineering will be applied. The usage of \emph{design patterns} \cite{gamma2015design} and the rules of \emph{separation of concern} \cite{de2002importance} and \emph{single responsibility} \cite{martinCleanArchitectureCraftsman2017} are commonly known in the field of software engineering and, therefore, a detailed explanation is omitted here.