T: 01423 563644   E: info@bloojam.co.uk     Follow us:


Bloojam Blog - Recruitment, People Development & HR Bloojam

Return to Blog page

Spotlight on… Psychometric Tests (Reliability and Validity)

Wednesday, January 6th, 2016

As independent suppliers of psychometric testing services, at Bloojam we are able to offer a wide range of reputable and effective psychometric tests.  There are also a number of tests in the marketplace that we are not willing to offer. This is because there are many tests out there that don’t meet the basic requirements for quality in terms of reliability and validity, and therefore do not measure what they claim to.

See the Test User Guidelines provided by the British Psychological Society (BPS) for more information.

Here’s an analogy:

Imagine a shop is selling a ruler that changes size in different temperatures and has differently spaced distance markers along its length.

You wouldn’t buy the ruler as it would give you inconsistent – or unreliable – measurements each time you used it. You would also not be able to measure the length of an object – it would give you an invalid measurement.

A reputable and high-quality psychometric test will have an accompanying manual that provides both reliability and validity data. If there’s no reliability and validity data, don’t waste your money!


A reliable test is one that measures consistently across time, individuals, and situations:

  • Internal consistency reliability (consistent performance of test questions or “items”)
  • Test-retest reliability (consistent performance if the same test-takers re-sit the test)
  • Alternate (parallel) forms reliability (consistent performance across two forms of the same test)


A valid test is one that measures what it is intended to measure:

A test has to be reliable (consistent and precise) in order for it to be valid (the changing size of your dodgy ruler and its inconsistent spacings are never going to be able to measure the length of an object effectively).

However, reliability alone does not guarantee validity (even if the ruler’s size stays the same and the distance markings are equally spaced, if the distances are not set at the correct space apart the ruler cannot give you an effective measurement of length).  In this case you would have internal and test-re-test reliability, but not validity.

Test manuals should provide a series of research studies to help build up a picture of a test’s validity in certain contexts.  Each study will provide validity correlations comparing, for example, test scores with training or job performance.  Consider whether the contexts are relevant to your own; just because a test has been shown to be valid for mechanics doesn’t mean it’s also valid for Customer Service advisors.

Key types of validity are:

  • Content validity concerns how well the content of the test can be linked to the content of the job, and usually relies on a thorough job analysis.
  • Predictive validity –usually through a range of predictive validity studies, data showing how well test scores predict later job or training performance.
  • Concurrent validity – data showing how well test scores relate to other scores obtained at the same time (e.g. supervisors’ ratings)
  • Construct validity (does the data show whether the test behaves like the theory behind it says it should).  This is usually shown by linking test scores with other related tests.

As a rough guide*, uncorrected** correlation values for ability tests can be interpreted as follows:

+0.2  – better than “chance” prediction

+0.3  – significantly better than “chance” prediction

+0.4 –  excellent prediction

*The correlation values will be higher or lower depending on the sample size of the study. Look at whether the correlation values are significant at the 0.01 (1%) or 0.05 (5%) level

**If the manual lists corrected correlation values, these correlation values should be higher.


Finally, a high-quality psychometric test should also show ‘face validity’ i.e. the test-taker should be able to see that the test is relevant and appropriate to their role and that it looks like it is measuring what it is supposed to, otherwise they, and other stakeholders, may question its use.