Proximal policy optimization based hybrid recommender systems for large scale recommendations

Padhye V.; Lakshmanan K.; Chaturvedi A.

doi:https://doi.org/10.1007/s11042-022-14231-x

Proximal policy optimization based hybrid recommender systems for large scale recommendations

Authors

Abstract

Recommender systems have become increasingly popular due to the significant rise in digital information over the internet in recent users. They help provide personalized recommendations to the user by selecting a few items out of a large set of items. However, with the growing size of item space and users, scalability remains a key issue for recommender systems. However, most existing policy gradient approaches in recommendations suffer from high variance leading to an increase in instability during the learning process. Policy Gradient Algorithms such as PPO are proven to be effective in large action spaces (a large number of items) as they learn the optimal policy directly from the samples. We use the PPO algorithm to train our Reinforcement Learning agent modeling the collaborative filtering process as a Markov Decision Process. PPO utilizes the actor-critic framework and thus mitigates the high variance in Policy Gradient Algorithms. Further, we address the cold start issue in Collaborative filtering with autoencoder-based content filtering. Proximal Policy Optimization (PPO) methods are today considered among the most effective reinforcement learning methods, achieving state-of-the-art performance and even outperforming Deep Q learning methods. In this paper, we propose a switching hybrid recommender system using the two different recommender system techniques. A switching hybrid system can switch between recommendation techniques depending on some criterion and can tackle its constituent recommender system’s shortfall using the other counterpart in a particular situation. We show that our method outperforms various baseline methods on the popular Movielens datasets for different evaluation metrics. On Movielens 1m, our method outperforms the baseline by 9.19% in terms of R@10 and 3.86% and 6.58% in terms of P@10 and P@20, respectively. For the Movielens 100k dataset, our method improves on the baseline methods by 4.10% in terms of P@10 and 3.90% and 2.40% in terms of R@10 and R@20. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.