Introduction: Functional capacity tests are standardized instruments to evaluate patients' capacities to execute work-related activities. Functional capacity test results are associated with biopsychosocial factors, making it unclear what is being measured in capacity testing. An overview of these factors was missing. The objective of this review was to investigate the level of evidence for factors that are associated with functional capacity test results in patients with non-specific chronic low back pain. Methods: A systematic literature review was performed identifying relevant studies from an electronic journal databases search. Candidate studies employed a cross-sectional or RCT design and were published between 1980 and October 2010. The quality of these studies was determined and level of evidence was reported for factors that were associated with capacity results in at least 3 studies. Results: Twenty-two studies were included. The level of evidence was reported for lifting low, lifting high, carrying, and static lifting capacity. Lifting low test results were associated with self-reported disability and specific self-efficacy but not with pain duration. There was conflicting evidence for associations of lifting low with pain intensity, fear of movement/(re)injury, depression, gender and age. Lifting high was associated with gender and specific self-efficacy, but not with pain intensity or age. There is conflicting evidence for the association of lifting high with the factors self-reported disability, pain duration and depression. Carrying was associated with self-reported disability and not with pain intensity and there is conflicting evidence for associations with specific self-efficacy, gender and age. Static lifting was associated with fear of movement/(re)injury. Conclusions: Much heterogeneity was observed in investigated capacity tests and candidate associated factors. There was some evidence for biological and psychological factors that are or are not associated with capacity results but there is also much conflicting evidence. High level evidence for social factors was absent.