In autonomous vehicles, forecasting future trajectories of the surrounding dynamic objects is essential for safe and precise behavioral planning. Forecasting trajectories is critical to preventing accidents involving vulnerable road users, such as pedestrians, where highly accurate prediction performance is required. Typically, pedestrians in a scene follow similar trajectories when moving together, whereas pedestrians approaching each other from opposite directions act to avoid collisions. Pedestrians in a scene construct and follow these social norms. Therefore, this article uses a cross-attention mechanism based on the trajectory and velocity information of each pedestrian to extract social interactions between objects in congested scenes and model social norms. We propose a trajectory and velocity tree constructed using a handcrafted approach to generate candidate trajectories and forecast feasible trajectories by integrating them with social interactions. The proposed method outperforms existing methods that rely solely on implicit latent variables for forecasting future trajectories, offering high interpretability by using trees to depict candidates of trajectories. The proposed network structure offers possibilities for additional research on predicting velocity profiles. To validate the inference performance of the proposed method, we conduct quantitative and qualitative experiments on pedestrian datasets, including the ETH, UCY, and Stanford Drone Dataset. These experiments demonstrate the superiority of the proposed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ITCA</i>. We also performed ablation studies to interpret the contributions of each module quantitatively. To our knowledge, the inference performance of the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ITCA</i> outperforms state-of-the-art approaches using only contextual data.