Designing a reward function is a critical challenge in reinforcement learning. However, as environments become more complex and tasks grow more difficult, designing a reward function that drives optimal behavior becomes increasingly challenging. To overcome these issues, Preference based reinforcement learning has proposed methods that learn reward functions based on the preference between two trajectories, thereby eliminating the need for handcrafted reward function. In multi-agent reinforcement learning, the challenge is even greater due to the complex interactions among agents, which makes designing a single global reward function even more difficult. In this paper, we show that when a single global reward function is learned via preference-based reinforcement learning in multi-agent setting, it often fails to capture sufficient information for optimal policy learning. Instead, we propose a method for learning individual reward functions that provide additional guidance for each agent’s optimal policy. Our approach, which leverages graph structures and preference-based reinforcement learning, outperforms the method based on learning a single, global reward function.