Classifying low-concentration Gas Sensor Array (GSA) data is hard due to low SNR, sensor heterogeneity, drift, and small samples. We benchmark time-series-to-image (TS2I) CNNs against time-series classifiers, after reproducing a strong FE-ELM baseline under a shared fold manifest. Using the GSA-LC and GSA-FM datasets, we compare FE-ELM, vector baselines, time-series methods, and TS2I-CNNs with 20 × 5 repeated stratified cross-validation (<i>n</i> = 100). ROCKET delivers the best accuracy on both datasets and is significantly better than TCN and MiniROCKET (paired tests with Holm-Bonferroni, <i>p</i> < 0.05): on GSA-FM, accuracy 0.9721 ± 0.0480 (95% CI [0.9627, 0.9815]) with Macro-F1 0.9757; on GSA-LC, 0.9578 ± 0.0433 (95% CI [0.9493, 0.9663]) with Macro-F1 0.9555. Among image-based models, CNN-RP is the most robust, whereas CNN-GASF lags, especially on GSA-LC. RGB fusion strategies (e.g., with MTF) are dataset-dependent and often inconsistent, and transfer learning with ResNet-18 offers no consistent advantage. Overall, ROCKET ranks first across folds, while CNN-RP is the most reliable TS2I alternative under low-concentration conditions. These results provide a reproducible, fair benchmark for e-nose applications and practical guidance for model selection, while clarifying both the potential and limitations of TS2I.