Polarization reconfigurable (PR) antennas provide a promising technology for next-generation wireless communications, due to their capability of reconfiguring wireless channel state. This paper pursues integrating PR antenna technology into massive multiple-input-multiple-output (MIMO) system in the scenario that channel state information (CSI) is not provided. We aim to jointly design the polarization and beamforming vectors on both transceivers for simultaneous channel reconfiguration and beam alignment to maximize the signal-to-noise-ratio (SNR). However, the joint optimization over the polarization and beamforming vectors without perfect CSI is a challenging task. Depolarization increases the dimension of CSI and massive MIMO systems typically have low-dimensional pilot measurement from limited number of radio frequency (RF) chains. This leads to pilot overhead because the transceivers can fulfill only the low-dimensional measurement of the high-dimensional channel. Substantial PR beamforming gain is achieved with significantly reduced pilot overhead, by employing a transformer-based framework on both transceivers to actively design the polarization and beamforming vectors for the pilot stage and transmission stage based on the sequence of received pilots. Numerical experiments demonstrate that the proposed framework remarkably outperforms the prevalent non-adaptive data-driven methods.