OBJECTIVES: There is increased interest in various new quantitative uptake metrics beyond standardized uptake value (SUV) in oncology PET/CT studies. The purpose of this study is to investigate the variability and test-retest repeatability (TRT) of metabolically active tumor volume (MATV) measurements and several other new quantitative metrics in non-small cell lung cancer (NSCLC) using [18F]FDG PET/CT with different segmentation methods, user interactions, uptake intervals, and reconstruction protocols. METHODS: Ten advanced NSCLC patients received two whole-body [18F]FDG PET/CT scans at both 60 and 90 min post-injection. PET data were reconstructed with four different protocols. Eight segmentation methods were applied to delineate lesions with and without a tumor mask. MATV, maximum and mean SUV (SUVmax, SUVmean), total lesion glycolysis (TLG), and intralesional heterogeneity features were derived. Variability and repeatability were evaluated using a generalized estimating equations statistical model with Bonferroni correction for multiple comparisons. The statistical model, including interaction between uptake interval and reconstruction protocol, was applied individually to the data obtained from each segmentation method. RESULTS: Without masking, none of the segmentation methods could delineate all lesions correctly. MATV was affected by both uptake interval and reconstruction settings for most segmentation methods. Similar observations were obtained for the uptake metrics SUVmax, SUVmean, TLG, homogeneity, entropy, and zone percentage. No effect of uptake interval was observed on TRT metrics, while the reconstruction protocol affected the TRT of SUVmax. Overall, segmentation methods showing poor quantitative performance in one condition showed better performance in other (combined) conditions. For some metrics, a clear statistical interaction was found between the segmentation method and both uptake interval and reconstruction protocol. CONCLUSION: All segmentation results need to be reviewed critically. MATV and other quantitative uptake metrics, as well as their TRT, depend on segmentation method, uptake interval, and reconstruction protocol. To obtain quantitative reliable metrics, with good TRT performance, the optimal segmentation method depends on local imaging procedure, the PET/CT system and/or reconstruction protocol. Rigid harmonization of imaging procedure and PET/CT performance will be helpful in mitigating this variability.