mirror of
https://github.com/13hannes11/bachelor_thesis.git
synced 2024-09-04 01:11:00 +02:00
reduce usage of therefore and replace by synonyms
This commit is contained in:
@@ -156,11 +156,11 @@ Advantages and disadvantages of basic recommendation techniques are listed in \a
|
||||
|
||||
\subsubsection{Advantages over Collaborative Filtering}
|
||||
|
||||
Collaborative filtering has several issues that content-based filtering doesn't have. According to \citeauthor{likaFacingColdStart2014} \cite{likaFacingColdStart2014} the \emph{cold start problem} is one of the well-known problems of recommender systems. It occurs when there is sparse information for users or items. In the case of collaborative filtering this issue concerns for both items and users. Content-based filtering does not have that issue with items as items are classified based on similarity to other items. The user cold start problem however still persists when a new user has not yet rated any items. Therefore, no similar items can be recommended.
|
||||
Collaborative filtering has several issues that content-based filtering doesn't have. According to \citeauthor{likaFacingColdStart2014} \cite{likaFacingColdStart2014} the \emph{cold start problem} is one of the well-known problems of recommender systems. It occurs when there is sparse information for users or items. In the case of collaborative filtering this issue concerns for both items and users. Content-based filtering does not have that issue with items as items are classified based on similarity to other items. The user cold start problem however still persists when a new user has not yet rated any items. Accordingly, no similar items can be recommended.
|
||||
Another common issue is the \emph{grey sheep problem}. \citeauthor{grasIdentifyingGreySheep2016} \cite{grasIdentifyingGreySheep2016}. Collaborative filtering approaches assume that users that are similar, have similar preferences. A user that is not similar to any of the current user or community of users fail that assumption. Therefore, good recommendations cannot be made. These users are called \emph{grey sheep users}. Item-based filtering does not have this issue as a user's preference is directly used to find similar items to the ones they like, not similar users.
|
||||
Usually, the need for domain knowledge is a disadvantage. However, as product configuration already has domain knowledge baked in to describe features and how they can be combined, this is not a disadvantage and can even be seen as an advantage. Therefore, domain knowledge can directly be used and does not first need to be learned indirectly.
|
||||
Usually, the need for domain knowledge is a disadvantage. However, as product configuration already has domain knowledge baked in to describe features and how they can be combined, this is not a disadvantage and can even be seen as an advantage. Hence, domain knowledge can directly be used and does not first need to be learned indirectly.
|
||||
Additionally, a collaborative filtering approach spans a larger comparison space, based on preferences, compared to content-based filtering that only uses the item attributes. Thus, for applications with a large solution space, reliance on product features instead of user similarity should be considered.
|
||||
Last, content-based filtering does not depend on historic group preference accuracy. Therefore, malicious actors that try to manipulate the recommendation system do not decrease recommendation accuracy. The same is true for inaccurate preferences. For example if a user's input into a system does not accurately reflect what they actually like.
|
||||
Last, content-based filtering does not depend on historic group preference accuracy. Thus, malicious actors that try to manipulate the recommendation system do not decrease recommendation accuracy. The same is true for inaccurate preferences. For example if a user's input into a system does not accurately reflect what they actually like.
|
||||
|
||||
\subsubsection{Advantages over Constrained-Based Recommendation}
|
||||
|
||||
|
||||
@@ -35,7 +35,7 @@ In practice, a finished configuration of a product is something that is ready to
|
||||
\subsection{Group-Based Product Configuration}
|
||||
\label{sec:Foundations:GroupBasedProductConfiguration}
|
||||
|
||||
Instead of a single person configuring a product, a group of people configures one product which can be useful in multi-stakeholder decisions. This setting needs mechanisms for describing the preferences of multiple people. Therefore, a set of users $U$ will be introduced with
|
||||
Instead of a single person configuring a product, a group of people configures one product which can be useful in multi-stakeholder decisions. This setting needs mechanisms for describing the preferences of multiple people. Thus, a set of users $U$ will be introduced with
|
||||
\begin{equation}\label{eq:Foundations:ProductConfiguration:Users}
|
||||
U = \{1, \dotsc, n\},
|
||||
\end{equation}
|
||||
@@ -96,7 +96,7 @@ Due to the fact that a thesis has limited resources, some assumptions have to be
|
||||
\item Speed and optimization of the system does not have high priority.
|
||||
\item The user interface is for demo purposes.
|
||||
\end{itemize}
|
||||
In order to reduce complexity, the assumptions are made that only single value attributes are supported and one product is configured at a time by a group. Therefore, less implementation work is needed. However, results are not dependent on these implementations. Most features use only single value attributes. It is also possible to model a feature that supports multiple attributes. As an example the feature \emph{audio system} has the following options: \emph{Bluetooth} or \emph{CD}. To allow multiple attributes to be selected the characteristics \emph{Bluetooth and CD} can be added.
|
||||
In order to reduce complexity, the assumptions are made that only single value attributes are supported and one product is configured at a time by a group. Consequently, less implementation work is needed. However, results are not dependent on these implementations. Most features use only single value attributes. It is also possible to model a feature that supports multiple attributes. As an example the feature \emph{audio system} has the following options: \emph{Bluetooth} or \emph{CD}. To allow multiple attributes to be selected the characteristics \emph{Bluetooth and CD} can be added.
|
||||
The assumption that users join the system and only start configuring once all members of the group joined is made to reduce the amount of work needed for creating a lobby and waiting system which is not relevant for showing the functionality of the recommender. Speed and optimization of the system having no high priority as the speed has no influence on the results of this thesis. Therefore, as long as the system is fast enough, no further optimisation of speed is needed.
|
||||
The evaluation is done solely with the recommender system and its APIs. Thus, the user interface is not directly relevant for the evaluation but it is relevant for communication of results.
|
||||
|
||||
@@ -228,7 +228,7 @@ where $aggr$ the aggregation function and $score_{user}(P_i, s)$ the configurati
|
||||
score_{user}(P_i, s) = average(\{x \ | \ (characteristic, x) \in P_i \land characteristic \in s \}) \notag .
|
||||
\end{equation}
|
||||
|
||||
The example in \autoref{fig:Concept:ForestExample} contains two users. The first user has preferences for the characteristic \emph{manual} of the feature with $0.8$ and the characteristic \emph{harvester} of the same feature with $0.3$. All other characteristics have a preference of $0.5$. The second user's preferences are $0.5$ for all characteristics. The finished configuration that is supposed to be rated in this example contains the characteristics \emph{low} for each feature except for \emph{effort} and \emph{quantity} which are set to \emph{manual} and \emph{high}. The score fore the finished configuration $S_F$ of user one is $0.54$. This score is the average of all seven features. User one rates all characteristics of all features as $0.5$ except two characteristics for \emph{effort}. Therefore all, feature scores for this user are $0.5$ except the score for \emph{effort} is $0.8$ because of the user's preference of $0.8$ for the characteristic \emph{manual}. The resulting average score per feature of $0.54$ is the user's score for this configuration. User two rates all characteristics with $0.5$ therefore the resulting average is $0.5$.
|
||||
The example in \autoref{fig:Concept:ForestExample} contains two users. The first user has preferences for the characteristic \emph{manual} of the feature with $0.8$ and the characteristic \emph{harvester} of the same feature with $0.3$. All other characteristics have a preference of $0.5$. The second user's preferences are $0.5$ for all characteristics. The finished configuration that is supposed to be rated in this example contains the characteristics \emph{low} for each feature except for \emph{effort} and \emph{quantity} which are set to \emph{manual} and \emph{high}. The score fore the finished configuration $S_F$ of user one is $0.54$. This score is the average of all seven features. User one rates all characteristics of all features as $0.5$ except two characteristics for \emph{effort}. Thus, all feature scores for this user are $0.5$ except the score for \emph{effort} is $0.8$ because of the user's preference of $0.8$ for the characteristic \emph{manual}. The resulting average score per feature of $0.54$ is the user's score for this configuration. User two rates all characteristics with $0.5$ therefore the resulting average is $0.5$.
|
||||
The group configuration score is dependent on the used aggregation strategy. Multiplication results in a score of $0.54 \cdot 0.5 = 0.27$. The score for average is $\frac{1}{2}(0.54 + 0.5) = 0.52$ and for least misery $\min \{0.54, 0.5\} = 0.5$.
|
||||
|
||||
\subsection{Configuration Change Penalty}
|
||||
@@ -248,7 +248,7 @@ In this thesis a penalty function is proposed which gives the percentage of char
|
||||
\end{equation}
|
||||
In essence the the function checks the number of unchanged characteristics and divides this by the number of characteristics that are in the current configuration state. The result is the proportion of unchanged characteristics when comparing the current configuration state to the finished configuration.
|
||||
|
||||
By including the current configuration state, the scoring function can take into account that some characteristics have already been realized and therefore might be very costly to change. A higher $\alpha$ resembles a higher cost of change and an alpha of zero represents no costs for changes.
|
||||
By including the current configuration state, the scoring function can take into account that some characteristics have already been realized and accordingly might be very costly to change. A higher $\alpha$ resembles a higher cost of change and an alpha of zero represents no costs for changes.
|
||||
|
||||
\section{Illustration}
|
||||
\label{sec:Concept:Illustration}
|
||||
|
||||
@@ -16,7 +16,7 @@ This thesis requires a group configuration system with a recommender. In this th
|
||||
\section{Microservice}
|
||||
\label{sec:DesignImplementation:Microservice}
|
||||
|
||||
An existing base system can be extended in different directions. As M.Collab already functions as middleman between M.Collab-Customer and M.Core, one solution would be to add recommender functionality to M.Collab-Customer. However, this approach would conflict with the single responsibility and separation of concern software design principles. It would change M.Collab from a system that works similar to a router to one that additionally handles recommendations. M.Collab manages communication between M.Collab-Customer instances and between M.Core and M.Collab-Customer. Therefore this solution was discarded.
|
||||
An existing base system can be extended in different directions. As M.Collab already functions as middleman between M.Collab-Customer and M.Core, one solution would be to add recommender functionality to M.Collab-Customer. However, this approach would conflict with the single responsibility and separation of concern software design principles. It would change M.Collab from a system that works similar to a router to one that additionally handles recommendations. M.Collab manages communication between M.Collab-Customer instances and between M.Core and M.Collab-Customer. Thus this solution was discarded.
|
||||
Another viable solution is adding the recommendation functionality to M.Core. The main benefit of this approach is that it allows the recommender to use systems that are used for solving constraint satisfaction problems to enhance recommendations. The disadvantage of this approach is, however, that now the recommender is tightly coupled with the configurator. Not only does this mean that reproducing results in this thesis is only possible with access to a non-open-source product but it also means that there is no possibility to use the recommender for other configuration solutions. Moreover, no plugin API exists and, as a result the extension would require a large effort.
|
||||
Due to these reasons it was decided to create a new microservice, called M.Recommend which communicates with M.Collab, and thereby it is loosely coupled with other system components. Additionally, this approach allows to add functionality that takes advantage of a constraint satisfaction problem solver by either communicating with one via an API or by integrating one directly into the recommender component M.Recommend. Moreover, it also allows the use of different technologies instead of JavaScript and Node.js. Using a separate component also allows to scale the system differently as it is possible to use multiple recommender instances for one configurator instance or one recommender instance with multiple configurator instances. This is an advantage in the age of automatically scaling systems.
|
||||
|
||||
@@ -55,7 +55,7 @@ Data is stored and retrieved using data access objects. Therefore, the currently
|
||||
\section{Database}
|
||||
\label{sec:DesignImplementation:Database}
|
||||
|
||||
The choice among database systems has to be made between \emph{non-relational} and \emph{relational} databases. Relational databases are good in regard to at the four ACID (atomicity, consistency, isolation, durability) principles \cite{chrysanthis1998recovery, cookACIDBASEDatabase2009}. Moreover, if the data structures are not changing it provides a solid basis that keeps the data reliable. A non-relational database on the other hand is ideal for rapid agile development. Moreover, it excels if data requirements are not entirely clear and if a large amount of unstructured data has to be stored. Moreover, non-relational databases allow the system to store the data in any kind of structure. This proves an advantage as it allows to use the same data structure to be stored that also has to be fed out through the api. Therefore parsing methods for the API can be reused and altered upon changing requirements. Moreover, the data used for the recommender is mostly not interconnected. Therefore a relational databases' main strength, the data structure, does not really come into play here. Thus, in this thesis a NoSQL database is used.
|
||||
The choice among database systems has to be made between \emph{non-relational} and \emph{relational} databases. Relational databases are good in regard to at the four ACID (atomicity, consistency, isolation, durability) principles \cite{chrysanthis1998recovery, cookACIDBASEDatabase2009}. Moreover, if the data structures are not changing it provides a solid basis that keeps the data reliable. A non-relational database on the other hand is ideal for rapid agile development. Moreover, it excels if data requirements are not entirely clear and if a large amount of unstructured data has to be stored. Moreover, non-relational databases allow the system to store the data in any kind of structure. This proves an advantage as it allows to use the same data structure to be stored that also has to be fed out through the api. Hence, parsing methods for the API can be reused and altered upon changing requirements. Moreover, the data used for the recommender is mostly not interconnected. Therefore a relational databases' main strength, the data structure, does not really come into play here. Thus, in this thesis a NoSQL database is used.
|
||||
|
||||
\section{Scoring Functions}
|
||||
\label{sec:DesignImplementation:ScroingFunctions}
|
||||
|
||||
Reference in New Issue
Block a user