AI is being applied to system design tasks, including generating and validating tech specs, ADRs, and other specifications.
AI for LLM-Enhanced Recommender System Design
Scaling recommender systems with large language models (LLMs) presents challenges in aligning the LLM's semantic space with the recommender's ID space Taiji: Pareto Optimal Policy Optimization.
Existing LLM-enhanced recommendation (LLM4Rec) paradigms struggle with measuring and improving Chain-of-Thought (CoT) quality during supervised fine-tuning (SFT) Taiji: Pareto Optimal Policy Optimization.
These paradigms also often neglect the trade-off between LLM semantic rewards and recommendation preference rewards during reinforcement learning (RL) alignment Taiji: Pareto Optimal Policy Optimization.
To overcome the SFT bottleneck, Taiji utilizes reverse-engineered reasoning and open-ended rejection sampling to generate high-quality, domain-specific CoT data Taiji: Pareto Optimal Policy Optimization.
POPO adaptively adjusts cross-domain reward weights to achieve an optimal trade-off between LLM semantic world knowledge and collaborative ID features Taiji: Pareto Optimal Policy Optimization.
Taiji has been deployed on Kuaishou's advertising platform, serving over 400 million users daily and yielding significant commercial revenue Taiji: Pareto Optimal Policy Optimization.