Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention | 노영민 교수 연구실 | 서울시립대학교 인공지능학과

노영민 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

preprint|

green

·인용수 0

·2026

Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

Dongheon Lee, Seokju Yun, Jaegyun Im, Youngmin Ro

arXiv (Cornell University)

초록

Recent Super-Resolution~(SR) methods mainly adopt Transformers for their strong long-range modeling capability and exceptional representational capacity. However, most SR Transformers rely heavily on relative positional bias~(RPB), which prevents them from leveraging hardware-efficient attention kernels such as FlashAttention. This limitation imposes a prohibitive computational burden during both training and inference, severely restricting attempts to scale SR Transformers by enlarging the training patch size or the self-attention window. Consequently, unlike other domains that actively exploit the inherent scalability of Transformers, SR Transformers remain heavily focused on effectively utilizing limited receptive fields. In this paper, we propose Rank-factorized Implicit Neural Bias~(RIB), an alternative to RPB that enables FlashAttention in SR Transformers. Specifically, RIB approximates positional bias using low-rank implicit neural representations and concatenates them with pixel content tokens in a channel-wise manner, turning the element-wise bias addition in attention score computation into a dot-product operation. Further, we introduce a convolutional local attention and a cyclic window strategy to fully leverage the advantages of long-range interactions enabled by RIB and FlashAttention. We enlarge the window size up to \textbf{96 $\times$ 96} while jointly scaling the training patch size and the dataset size, maximizing the benefits of Transformers in the SR task. As a result, our network achieves \textbf{35.63\,dB PSNR} on Urban100 $\times$ 2, while reducing training and inference time by \textbf{2.1 $\times$ } and \textbf{2.9 $\times$ }, respectively, compared to the RPB-based SR Transformer~(PFT).

키워드

TransformerScalabilityInferenceComputationExploitLeverage (statistics)ScalingConvolutional neural network

타입

preprint

IF / 인용수

- / 0

원문

https://doi.org/10.48550/arxiv.2603.06738

게재 연도

2026

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)