Annex E - Laboratory Quality Assurance

Codified File

Validation Principle

OIV-MA-AS1-05 Principle of validation of routine methods with respect of reference methods

Principle of validation of routine methods with respect to reference methods

 

The OIV acknowledges the existence of methods of analysis of wines in addition to those described in the Summary of International Methods of Analysis of Wines and Musts, of common methods most often automated.  These methods are economically and commercially important because they permit maintaining a complete and efficient analytical framework around the production and marketing of wine.  Moreover, these methods allow the use of modern means of analysis and the development and adaptation of techniques of analysis.

In order to allow laboratories to use these methods and to insure their linkage to methods described within the Summary, the OIV decides to establish a plan of evaluation and validation by a laboratory of an alternative, common method, mechanized or not with respect to a reference method described in the Summary of International Methods of Analysis of Wines and Musts.

This principle, which will be adapted to the particular situation of the analysis of wines and musts, will take its inspiration from international standards in current use and allow the laboratory to assess and validate its alternative method in two ways:

 

Collaborative Study

OIV-MA-AS1-07 Collaborative study

The purpose of the collaborative study is to give a quantified indication of the precision of method of analysis, expressed as its repeatability r and reproducibility R.

 

Repeatability: the value below which the absolute difference between two single test results obtained using the same method on identical test material, under the same conditions (same operator, same apparatus, same laboratory and a short period of time) may be expected to lie within a specified probability.

 

Reproducibility: the value below which the absolute difference between two single test results obtained using the same method on identical test material, under different conditions (different operators, different apparatus and/or different laboratories and/or different time) may be expected to lie within a specified probability.

The term "individual result" is the value obtained when the standardized trial method is applied, once and fully, to a single sample. Unless otherwise stated, the probability is 95%.

 

General Principles

  • The method subjected to trial must be standardized, that is, chosen from the existing methods as the method best suited for subsequent general use.
  • The protocol must be clear and precise.
  • The number of laboratories participating must be at least ten.
  • The samples used in the trials must be taken from homogeneous batches of material.
  • The levels of the analyte to be determined must cover the concentrations generally encountered.
  • Those taking part must have a good experience of the technique employed.
  • For each participant, all analyses must be conducted within the same laboratory by the same analyst.
  • The method must be followed as strictly as possible.  Any departure from the method described must be documented.
  • The experimental values must be determined under strictly identical conditions: on the same type of apparatus, etc.
  • They must be determined independently of each other and immediately after each other.
  • The results must be expressed by all laboratories in the same units, to the same number of decimal places.
  • Five replicate experimental values must be determined, free from outliers.  If an experimental value is an outlier according to the Grubbs test, three additional measurements must be taken.

Statistical Model

The statistical methods set out in this document are given for one level (concentration, sample).  If there are a number of levels, the statistical evaluation must be made separately for each.  If a linear relationship is found (y = bx or y = a + bx) as between the repeatability (r) or reproducibility (R) and the concentration (), a regression of r (or R) may be run as a function of .

The statistical methods given below suppose normally‑distributed random values.

The steps to be followed are as follows:

A/ Elimination of outliers within a single laboratory by Grubbs test.  Outliers are values which depart so far from the other experimental values that these deviations cannot be regarded as random, assuming the causes of such deviations are not known.

B/ Examine whether all laboratories are working to the same precision, by comparing variances by the Bartlett test and Cochran test.  Eliminate those laboratories for which statistically deviant values are obtained.

C/ Track down the systematic errors from the remaining laboratories by a variance analysis and by a Dixon test identify the extreme outlier values.  Eliminate those laboratories for which the outlier values are significant.

D/ From the remaining figures, calculate standard deviation of repeatability); Sr., and repeatability r standard deviation of reproducibility SR and reproducibility R.

Notation:

The following designations have been chosen:

m Number of laboratories

i(i = 1, 2... m) Index (No. of the laboratory)

 Number of individual values from the ith laboratory

Total number of individual values

x(i = 1, 2... ni) Individual value of the ith laboratory

Mean value of the ith laboratory

Total mean value

Standard deviation of the ith laboratory

A/ Verification of outlier values within one laborator

 

After determining five individual values , a Grubbs test is performed at the laboratory, to identify the outliers’ values.

Test the null hypothesis whereby the experimental value with the greatest absolute deviation from the mean is not an outlier observation.

Calculate PG =

= suspect value

Compare PG with the corresponding value shown in Table 1 for P = 95%.

If PG < value as read, value is not an outlier and si can be calculated.

If PG > value as read, value probably is an outlier therefore make a further three determinations.

Calculate the Grubbs test for with the eight determinations.

If PG > corresponding value for P = 99%, regard as a deviant value and calculate without .

B/ Comparison of variances among laboratories

 

Bartlett Test

The Bartlett test allows us to examine both major and minor variances.  It serves to test the null hypothesis of the equality of variances in all laboratories, as against the alternative hypothesis whereby the variances are not equal in the case of some laboratories.

At least five individual values are required per laboratory.

Calculate the statistics of the test:

 

Compare PB with the value indicated in table 2 at m - 1 degrees of freedom.

If PB > the value in the table, there are differences among the variances.

The Cochran test is used to confirm that the variance from one laboratory is greater than that from other laboratories.

Calculate the test statistics:

Compare PC with the value shown in table 3 for m and at P = 99%.

If PC > the table value, the variance is significantly greater than the others.

If there is a significant result from the Bartlett or Cochran tests, eliminate the outlier variance and calculate the statistical test again.

In the absence of a statistical method appropriate to a simultaneous test of several outlier values, the repeated application of the tests is permitted, but should be used with caution.

If the laboratories produce variances that differ sharply from each other, an investigation must be made to find the causes and to decide whether the experimental values found by those laboratories are to be eliminated or not.  If they are, the coordinator will have to consider how representative the remaining laboratories are.

If statistical analysis shows that there are differing variances, this shows that the laboratories have operated the methods at varying precisions.  This may be due to inadequate practice or to lack of clarity or inadequate description in the method.

C/ Systematic errors

Systematic errors made by laboratories are identified using either Fischer's method or Dixon's test.

R .A. Fischer variance analysis

This test is applied to the remaining experimental values from the laboratories with an identical variance.

The test is used to identify whether the spread of the mean values from the laboratories is very much greater than that for the individual values expressed by the variance among the laboratories () or the variance within the laboratories ().

Calculate the test statistics :

Compare PF with the corresponding value shown in table 4 (distribution of F) where = = m 1 and = = N ‑ m degrees of freedom.

If PF > the table value, it can be concluded that there are differences among the means, that is, there are systematic errors.

Dixon test

This test enables us to confirm that the mean from one laboratory is greater or smaller than that from the other laboratories.

Take a data series Z(h), h = 1,2,3...H, ranged in increasing order.

Calculate the statistics for the test:

3 to 7

Or

8 to 12

Or

13 plus

Or

Compare the greatest value of Q with the critical values shown in table 5.

If the test statistic is > the table value at P = 95%, the mean in question can be regarded as an outlier.

If there is a significant result in the R A Fischer variance analysis or the Dixon test, eliminate one of the extreme values and calculate the test statistics again with

the remaining values. As regards repeated application of the tests, see the explanations in paragraph (B).

If the systematic errors are found, the corresponding experimental values concerned must not be included in subsequent computations; the cause of the systematic error must be investigated.

D/Calculating repeatability (r) and reproducibility (R).

From the results remaining after elimination of outliers, calculate the standard deviation of repeatability sr and repeatability r, and the standard deviation of reproducibility sR and reproducibility R, which are shown as characteristic values of the method of analysis.

If there is no difference between the means from the laboratories, then there is no difference between sr and sR or between r and R.  But, if we find differences among the laboratory means, although these may be tolerated for practical considerations, we have to show and and r and R.

Bibliography

  • AFNOR, norme NFX06041, Fidélitè des méthodes d'essai.  Détermination de la répétabilité et de la reproductibilité par essais interlaboratoires.
  • DAVIES O. L., GOLDSMITH P.l., Statistical Methods in Research and Production, Oliver and Boyd, Edinburgh, 1972.
  • GOETSCH F. H., KRÖNERT W., OLSCHIMKE D., OTTO U., VIERKÖTTER S., Meth. An., 1978, No 667.
  • GOTTSCHALK G., KAISER K. E., Einführung in die Varianzanalyse und Ringversuche, B‑1 Hoschultaschenbücher, Band 775, 1976.
  • GRAF, HENNING, WILRICH, Statistische Methoden bei textilen Untersuchungen, Springer Verlag, Berlin, Heidelberg, New York, 1974.
  • GRUBBS F. E., Sample Criteria for Testing Outlying Observations, The Annals of Mathematical Statistics, 1950, vol. 21, p 27‑58.
  • GRUBBS F. E., Procedures for Detecting Outlying Observations in Samples, Technometrics, 1969, vol. 11, No 1, p 1‑21.
  • GRUBBS F. E. and BECK G., Extension of Sample Sizes and Percentage Points for Significance Tests of Outlying Observations, Technometrics, 1972, vol. 14, No 4, p 847‑854.
  • ISO, norme 5725.
  • KAISER R., GOTTSCHALK G., Elementare Tests zur Beurteilung von Messdaten, B‑I Hochschultaschenbücher, Band 774, 1972.
  • LIENERT G. A., Verteilungsfreie Verfahren in der Biostatistik, Band I, Verlag Anton Haine, Meisenheim am Glan, 1973.
  • NALIMOV V. V., The Application of Mathematical Statistics to Chemical Analysis, Pergamon Press, Oxford, London, Paris, Frankfurt, 1963.
  • SACHS L., Statistische Auswertungsmethoden, Springer Verlag, Berlin, Heidelberg, New York, 1968

 

Table 1 -  Critical values for the Grubbs test

P = 95%

  P 99%

3

4

5

6

7

8

9

10

11

12

1,155

1,481

1,715

1,887

2,020

2,126

2,215

2,290

2,355

2,412

1,155

1,496

1,764

1,973

2,139

2,274

2,387

2,482

2,564

2,636

Table 2 – Critical values for the Bartlett test (P = 95%)

f(m - 1)

X2

f(m - 1)

 X2

1

3,84

5,99

7,81

9,49

11,07

12,59

14,07

15,51

16,92

18,31

19,68

21,03

22,36

23,69

25,00

26,30

27,59

28,87

30,14

31,41

21

22

23

24

25

26

27

28

29

30

35

40

50

60

70

80

90

100

32,7

33,9

35,2

36,4

37,7

38,9

40,1

41,3

42,6

43,8

49,8

55,8

67,5

79,1

90,5

101,9

113,1

124,3

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Table 3 – Critical values for the Cochran test

m

ni =  2

ni= 3

ni = 4

ni = 5

ni = 6

99%

95%

99%

95%

99%

95%

99%

95%

99%

95%

2

-

-

0.995

0.975

0.979

0.939

0.959

0.906

0.937

0.877

3

0.993

0.967

0.942

0.871

0.883

0.798

0.834

0.746

0.793

0.707

4

0.968

0.906

0.864

0.768

0.781

0.684

0.721

0.629

0.676

0.590

5

0.928

0.841

0.788

0.684

0.696

0.598

0.633

0.544

0.588

0.506

6

0.883

0.781

0.722

0.616

0.626

0.532

0.564

0.480

0.520

0.445

7

0.838

0.727

0.664

0.561

0.568

0.480

0.508

0.431

0.466

0.397

8

0.794

0.680

0.615

0.516

0.521

0.438

0.463

0.391

0.423

0.360

9

0.754

0.638

0.573

0.478

0.481

0.403

0.425

0.358

0.387

0.329

10

0.718

0.602

0.536

0.445

0.447

0.373

0.393

0.331

0.357

0.303

11

0.684

0.570

0.504

0.417

0.418

0.348

0.366

0.308

0.332

0.281

12

0.653

0.541

0.475

0.392

0.392

0.326

0.343

0.288

0.310

0.262

13

0.624

0.515

0.450

0.371

0.369

0.307

0.322

0.271

0.291

0.246

14

0.599

0.492

0.427

0.352

0.349

0.291

0.304

0.255

0.274

0.232

15

0.575

0.471

0.407

0.335

0.332

0.276

0.288

0.242

0.259

0.220

16

0.553

0.452

0.388

0.319

0.316

0.262

0.274

0.230

0.246

0.208

17

0.532

0.434

0.372

0.305

0.301

0.250

0.261

0.219

0.234

0.198

18

0.514

0.418

0.356

0.293

0.288

0.240

0.249

0.209

0.223

0.189

19

0.496

0.403

0.343

0.281

0.276

0.230

0.238

0.200

0.214

0.181

20

0.480

0.389

0.330

0.270

0.265

0.220

0.229

0.192

0.205

0.174

21

0.465

0.377

0.318

0.261

0.255

0.212

0.220

0.185

0.197

0.167

22

0.450

0.365

0.307

0.252

0.246

0.204

0.212

0.178

0.189

0.160

23

0.437

0.354

0.297

0.243

0.238

0.197

0.204

0.172

0.182

0.155

24

0.425

0.343

0.287

0.235

0.230

0.191

0.197

0.166

0.176

0.149

25

0.413

0.334

0.278

0.228

0.222

0.185

0.190

0.160

0.170

0.144

26

0.402

0.325

0.270

0.221

0.215

0.179

0.184

0.155

0.164

0.140

27

0.391

0.316

0.262

0.215

0.209

0.173

0.179

0.150

0.159

0.135

28

0.382

0.308

0.255

0.209

0.202

0.168

0.173

0.146

0.154

0.131

29

0.372

0.300

0.248

0.203

0.196

0.164

0.168

0.142

0.150

0.127

30

0.363

0.293

0.241

0.198

0.191

0.159

0.164

0.138

0.145

0.124

31

0.355

0.286

0.235

0.193

0.186

0.155

0.159

0.134

0.141

0.120

32

0.347

0.280

0.229

0.188

0.181

0.151

0.155

0.131

0.138

0.117

33

0.339

0.273

0.224

0.184

0.177

0.147

0.151

0.127

0.134

0.114

34

0.332

0.267

0.218

0.179

0.172

0.144

0.147

0.124

0.131

0.111

35

0.325

0.262

0.213

0.175

0.168

0.140

0.144

0.121

0.127

0.108

36

0.318

0.256

0.208

0.172

0.165

0.137

0.140

0.119

0.124

0.106

37

0.312

0.251

0.204

0.168

0.161

0.134

0.137

0.116

0.121

0.103

38

0.306

0.246

0.200

0.164

0.157

0.131

0.134

0.113

0.119

0.101

39

0.300

0.242

0.196

0.161

0.154

0.129

0.131

0.111

0.116

0.099

40

0.294

0.237

0.192

0.158

0.151

0.126

0.128

0.108

0.114

0.097

Table 4 – Critical values for the F-Test (P=99%)

f1

 f2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

1

4052

4999

5403

5625

5764

5859

5928

5981

6023

6056

6083

6106

6126

6143

6157

2

98.5

99.0

99.2

99.3

99.3

99.3

99.4

99.4

99.4

99.4

99.4

99.4

99.4

99.4

99.4

3

34.1

30.8

29.4

28.7

28.2

27.9

27.7

27.5

27.3

27.2

27.1

27.1

27.0

26.9

26.9

4

21.2

18.0

16.7

16.0

15.5

15.2

15.0

14.8

14.7

14.5

14.5

14.4

14.3

14.2

14.2

5

16.3

13.3

12.1

11.4

11.0

10.7

10.5

10.3

10.2

10.1

9.96

9.89

9.82

9.77

9.72

6

13.7

10.9

9.78

9.15

8.75

8.47

8.26

8.10

7.98

7.87

7.79

7.72

7.66

7.60

7.56

7

12.2

9.55

8.45

7.85

7.46

7.19

6.99

6.84

6.72

6.62

6.54

6.47

6.41

6.36

6.31

8

11.3

8.65

7.59

7.01

6.63

6.37

6.18

6.03

5.91

5.81

5.73

5.67

5.61

5.56

5.52

9

10.6

8.02

6.99

6.42

6.06

5.80

5.61

5.47

5.35

5.26

5.18

5.11

5.05

5.01

4.96

10

10.0

7.56

6.55

5.99

5.64

5.39

5.20

5.06

4.94

4.85

4.77

4.71

4.65

4.60

4.56

11

9.64

7.20

6.21

5.67

5.31

5.07

4.88

4.74

4.63

4.54

4.46

4.39

4.34

4.29

4.25

12

9.33

6.93

5.95

5.41

5.06

4.82

4.64

4.50

4.39

4.30

4.22

4.16

4.10

4.05

4.01

13

9.07

6.70

5.74

5.21

4.86

4.62

4.44

4.30

4.19

4.10

4.02

3.96

3.90

3.86

3.82

14

8.86

6.51

5.56

5.04

4.69

4.46

4.28

4.14

4.03

3.94

3.86

3.80

3.75

3.70

3.66

15

8.68

6.36

5.42

4.89

4.56

4.32

4.14

4.00

3.89

3.80

3.73

3.67

3.61

3.56

3.52

16

8.53

6.23

5.29

4.77

4.44

4.20

4.03

3.89

3.78

3.69

3.62

3.55

3.50

3.45

3.41

17

8.40

6.11

5.18

4.67

4.34

4.10

3.93

3.79

3.68

3.59

3.52

3.46

3.40

3.35

3.31

18

8.29

6.01

5.09

4.58

4.25

4.01

3.84

3.71

3.60

3.51

3.43

3.37

3.32

3.27

3.23

19

8.18

5.93

5.01

4.50

4.17

3.94

3.77

3.63

3.52

3.43

3.36

3.30

3.24

3.19

3.15

20

8.10

5.85

4.94

4.43

4.10

3.87

3.70

3.56

3.46

3.37

3.29

3.23

3.18

3.13

3.09

21

8.02

5.78

4.87

4.37

4.04

3.81

3.64

3.51

3.40

3.31

3.24

3.17

3.12

3.07

3.03

22

7.95

5.72

4.82

4.31

3.99

3.76

3.59

3.45

3.35

3.26

3.18

3.12

3.07

3.02

2.98

23

7.88

5.66

4.76

4.26

3.94

3.71

3.54

3.41

3.30

3.21

3.14

3.07

3.02

2.97

2.93

24

7.82

5.61

4.72

4.22

3.90

3.67

3.50

3.36

3.26

3.17

3.09

3.03

2.98

2.93

2.89

25

7.77

5.57

4.68

4.18

3.85

3.63

3.46

3.32

3.22

3.13

3.06

2.99

2.94

2.89

2.85

26

7.72

5.53

4.64

4.14

3.82

3.59

3.42

3.29

3.18

3.09

3.02

2.96

2.90

2.86

2.81

27

7.68

5.49

4.60

4.11

3.78

3.56

3.39

3.26

3.15

3.06

2.99

2.93

2.87

2.82

2.78

28

7.64

5.45

4.57

4.07

3.75

3.53

3.36

3.23

3.12

3.03

2.96

2.90

2.84

2.79

2.75

29

7.60

5.42

4.54

4.04

3.73

3.50

3.33

3.20

3.09

3.00

2.93

2.87

2.81

2.77

2.73

30

7.56

5.39

4.51

4.02

3.70

3.47

3.30

3.17

3.07

2.98

2.91

2.84

2.79

2.74

2.70

40

7.31

5.18

4.31

3.83

3.51

3.29

3.12

2.99

2.89

2.80

2.73

2.66

2.61

2.56

2.52

50

7.17

5.06

4.20

3.72

3.41

3.19

3.02

2.89

2.78

2.70

2.62

2.56

2.51

2.46

2.42

60

7.07

4.98

4.13

3.65

3.34

3.12

2.95

2.82

2.72

2.63

2.56

2.50

2.44

2.39

2.35

70

7.01

4.92

4.07

3.60

3.29

3.07

2.91

2.78

2.67

2.59

2.51

2.45

2.40

2.35

2.31

80

6.96

4.88

4.04

3.56

3.25

3.04

2.87

2.74

2.64

2.55

2.48

2.42

2.36

2.31

2.27

90

6.92

4.85

4.01

3.53

3.23

3.01

2.84

2.72

2.61

2.52

2.45

2.39

2.33

2.29

2.24

100

6.89

4.82

3.98

3.51

3.21

2.99

2.82

2.69

2.59

2.50

2.43

2.37

2.31

2.27

2.22

200

6.75

4.71

3.88

3.41

3.11

2.89

2.73

2.60

2.50

2.41

2.34

2.27

2.22

2.17

2.13

500

6.69

4.65

3.82

3.36

3.05

2.84

2.68

2.55

2.44

2.36

2.29

2.22

2.17

2.12

2.07

6.63

4.61

3.78

3.32

3.02

2.80

2.64

2.51

2.41

2.32

2.25

2.18

2.13

2.08

2.04

Table 4 – Critical values for the F-Test (P=99%) [Continued]

f1

 f2

16

17

18

19

20

30

40

50

60

70

80

100

200

500

1

6169

6182

6192

6201

6209

6261

6287

6303

6313

6320

6326

6335

6350

6361

6366

2

99.4

99.4

99.4

99.4

99.5

99.5

99.5

99.5

99.5

99.5

99.5

99.5

99.3

99.5

99.5

3

26.8

26.8

26.8

26.7

26.7

26.5

26.4

26.4

26.3

26.3

26.3

26.2

26.2

26.1

26.1

4

14.2

14.1

14.1

14.0

14.0

13.8

13.7

13.7

13.7

13.6

13.6

13.6

13.5

13.5

13.5

5

9.68

9.64

9.61

9.58

9.55

9.38

9.29

9.24

9.20

9.18

9.16

9.13

9.08

9.04

9.02

6

7.52

7.48

7.45

7.42

7.40

7.23

7.14

7.09

7.06

7.03

7.01

6.99

6.93

6.90

6.88

7

6.28

6.24

6.21

6.18

6.16

5.99

5.91

5.86

5.82

5.80

5.78

5.75

5.70

5.67

5.65

8

5.48

5.44

5.41

5.38

5.36

5.20

5.12

5.07

5.03

5.01

4.99

4.96

4.91

4.88

4.86

9

4.92

4.89

4.86

4.83

4.81

4.65

4.57

4.52

4.48

4.46

4.44

4.41

4.36

4.33

4.31

10

4.52

4.49

4.46

4.43

4.41

4.25

4.17

4.12

4.08

4.06

4.04

4.01

3.96

3.93

3.91

11

4.21

4.18

4.15

4.12

4.10

3.94

3.86

3.81

3.77

3.75

3.73

3.70

3.65

3.62

3.60

12

3.97

3.94

3.91

3.88

3.86

3.70

3.62

3.57

3.54

3.51

3.49

3.47

3.41

3.38

3.36

13

3.78

3.74

3.72

3.69

3.66

3.51

3.42

3.37

3.34

3.32

3.30

3.27

3.22

3.19

3.17

14

3.62

3.59

3.56

3.53

3.51

3.35

3.27

3.22

3.18

3.16

3.14

3.11

3.06

3.03

3.00

15

3.49

3.45

3.42

3.40

3.37

3.21

3.13

3.08

3.05

3.02

3.00

2.98

2.92

2.89

2.87

16

3.37

3.34

3.31

3.28

3.26

3.10

3.02

2.97

2.93

2.91

2.89

2.86

2.81

2.78

2.75

17

3.27

3.24

3.21

3.19

3.16

3.00

2.92

2.87

2.83

2.81

2.79

2.76

2.71

2.68

2.65

18

3.19

3.16

3.13

3.10

3.08

2.92

2.84

2.78

2.75

2.72

2.70

2.68

2.62

2.59

2.57

19

3.12

3.08

3.05

3.03

3.00

2.84

2.76

2.71

2.67

2.65

2.63

2.60

2.55

2.51

2.49

20

3.05

3.02

2.99

2.96

2.94

2.78

2.69

2.64

2.61

2.58

2.56

2.54

2.48

2.44

2.42

21

2.99

2.96

2.93

2.90

2.88

2.72

2.64

2.58

2.55

2.52

2.50

2.48

2.42

2.38

2.36

22

2.94

2.91

2.88

2.85

2.83

2.67

2.58

2.53

2.50

2.47

2.45

2.42

2.36

2.33

2.31

23

2.89

2.86

2.83

2.80

2.78

2.62

2.54

2.48

2.45

2.42

2.40

2.37

2.32

2.28

2.26

24

2.85

2.82

2.79

2.76

2.74

2.58

2.49

2.44

2.40

2.38

2.36

2.33

2.27

2.24

2.21

25

2.81

2.78

2.75

2.72

2.70

2.54

2.45

2.40

2.36

2.34

2.32

2.29

2.23

2.19

2.17

26

2.78

2.75

2.72

2.69

2.66

2.50

2.42

2.36

2.33

2.30

2.28

2.25

2.19

2.16

2.13

27

2.75

2.71

2.68

2.66

2.63

2.47

2.38

2.33

2.29

2.27

2.25

2.22

2.16

2.12

2.10

28

2.72

2.68

2.65

2.63

2.60

2.44

2.35

2.30

2.26

2.24

2.22

2.19

2.13

2.09

2.06

29

2.69

2.66

2.63

2.60

2.57

2.41

2.33

2.27

2.23

2.21

2.19

2.16

2.10

2.06

2.03

30

2.66

2.63

2.60

2.57

2.55

2.39

2.30

2.25

2.21

2.18

2.16

2.13

2.07

2.03

2.01

40

2.48

2.45

2.42

2.39

2.37

2.20

2.11

2.06

2.02

1.99

1.97

1.94

1.87

1.85

1.80

50

2.38

2.35

2.32

2.29

2.27

2.10

2.01

1.95

1.91

1.88

1.86

1.82

1.76

1.71

1.68

60

2.31

2.28

2.25

2.22

2.20

2.03

1.94

1.88

1.84

1.81

1.78

1.75

1.68

1.63

1.60

70

2.27

2.23

2.20

2.18

2.15

1.98

1.89

1.83

1.78

1.75

1.73

1.70

1.62

1.57

1.54

80

2.23

2.20

2.17

2.14

2.12

1.94

1.85

1.79

1.75

1.71

1.69

1.65

1.58

1.53

1.49

90

2.21

2.17

2.14

2.11

2.09

1.92

1.82

1.76

1.72

1.68

1.66

1.62

1.55

1.50

1.46

100

2.19

2.15

2.12

2.09

2.07

1.89

1.80

1.74

1.69

1.66

1.63

1.60

1.52

1.47

1.43

200

2.09

2.06

2.03

2.00

1.97

1.79

1.69

1.63

1.58

1.55

1.52

1.48

1.39

1.33

1.28

500

2.04

2.00

1.97

1.94

1.92

1.74

1.63

1.56

1.52

1.48

1.45

1.41

1.31

1.23

1.16

2.00

1.97

1.93

1.90

1.88

1.70

1.59

1.52

1.47

1.43

1.40

1.36

1.25

1.15

1.00

Table 5 –  Critical values for the Dixon test

Test criteria

Critical values

m

95%

99%

3

0,970

0,994

Z(2)   –   Z(1)

ou Z(H) –  Z (H – 1)

4

0,829

0,926

Z(H) – Z(1)

Z(H) – Z(1)

5

0,710

0,821

The greater of the two values

6

0,628

0,740

7

0,569

0,680

8

0,608

0,717

Z(2) – Z(1) ou

Z(H) –  Z (H – 1)

9

0,564

0,672

Z(H – 1) – Z(1)

Z(H) – Z(2)

10

0,530

0,635

The greater of the two values

11

0,502

0,605

12

0,479

0,579

13

0,611

0,697

 Z(3) – Z(1)

ou Z(H) –  Z (H – 2)

14

0,586

0,670

Z(H – 2) –  Z(1)

 Z(H) – Z(3)

15

0,565

0,647

     The greater of the two values

16

0,546

0,627

17

0,529

0,610

18

0,514

0,594

19

0,501

0,580

20

0,489

0,567

21

0,478

0,555

22

0,468

0,544

23

0,459

0,535

24

0,451

0,526

25

0,443

0,517

26

0,436

0,510

27

0,429

0,502

28

0,423

0,495

29

0,417

0,489

30

0,412

0,483

31

0,407

0,477

32

0,402

0,472

33

0,397

0,467

34

0,393

0,462

35

0,388

0,458

36

0,384

0,454

37

0,381

0,450

38

0,377

0,446

39

0,374

0,442

40

0,371

0,438

Table 6 –  Results of the collaborative study

Analysis

Sample

Lab nº

Individual values x1

1

2

3

4

5

6

7

8

1

548

556

558

553

542

5

551

6,47

41,8

2

300

299

304

308

300

5

302

3,83

14,7

3

567

558

563

532*

560

560

563

567

7

563

3,51

12,3

4

557

550

555

560

551

5

555

4,16

17,3

5

569

575

565

560

572

5

568

5,89

34,7

6

550

546

549

557

588

570

576

568

8

563

14,92

222,6

 

7

557

560

560

552

547

5

555

5,63

31,7

8

548

543

560

551

548

5

550

6,28

39,5

9

558

563

551

555

560

5

556

5,63

31,7

10

554

559

551

545

557

5

553

5,5

30,2

Statistical Figures:

Bartlett Test:

 

Within laboratory: = 5.37  

PB = 3.16 < 15.51  (95%; ƒ = 8)

 

Between laboratory: = 13.97 ƒz = 7

Analysis of variance:

= 5.37 

r = 15

sR = 7.78

R = 22

PF = 6.76 > 3.21 (99%; = 7;  = 34)

Reliability of methods

OIV-MA-AS1-08 Reliability of analytical results

Data concerning the reliability of analytical methods, as determined by collaborative studies, are applicable in the following cases:

  1. Verifying the results obtained by a laboratory with a reference method
  2. Evaluating analytical results which indicate a legal limit has been exceeded
  3. Comparing results obtained by two or more laboratories and comparing those results with a reference value
  4. Evaluating results obtained from a non-validated method
  1. Verification of the acceptability of results obtained with a reference method

 

The validity of analytical results depends on the following:

  • the laboratory should perform all analyses within the framework of an appropriate quality control system which includes the organization, responsibilities, procedures, etc.
  • as part of the quality control system, the laboratory should operate according to an internal Quality Control Procedure
  • results should be obtained in accordance with the acceptability criteria described in the internal Quality Control Procedure

Internal quality control shall be established in accordance with internationally recognized standards, such those of the IUPAC document titled, "Harmonized Guidelines for Internal Quality Control in Analytical Laboratories."

Internal Quality Control implies an analysis of the reference material.

Reference samples should consist of a template of the samples to be analyzed and should contain an appropriate, known concentration of the substance analyzed which is similar to that found in the sample.

To the extent possible, reference material shall be certified by an internationally recognized organization.

However, for many types of analysis, there are no certified reference materials.  In this case, one could use, for example, material analyzed by several laboratories in a competence test and considering the average of the results to be the value assigned to the substance analyzed.

One could also prepare reference material by formulation (model solution with known components) or by adding a known quantity of the substance analyzed to a

sample which does not contain (or not yet contain) the substance by means of a recovery test (dosed addition) on one of the samples to analyze.

Quality Control is assured by adding reference material to each series of samples, and analyzing these pairs (test samples and reference material).  This verifies correct implementation of the method and should be independent of the analytical calibration and protocol as its goal is to verify the aforementioned.

Series means a number of samples analyzed under repeatable conditions.  Internal controls serve to ensure the appropriate level of uncertainty is not exceeded.

If the analytical results are considered to be part of a normal population whose mean is m and standard deviation is s, only around 0.3% of the results will be outside the limits m ± 3s.  When aberrant results are obtained (outside these limits), the system is considered to be outside statistical control (unreliable data).

The control is graphically represented using Shewhart Control Graphs.  To produce these graphical results, the measured values obtained from the reference material are placed on the vertical axis while the series numbers are placed on the horizontal axis.  The graph also includes horizontal lines representing the mean, m, m ± 2 (warning limits) and m ± 3 (action limits) (Figure 1).

To estimate the standard deviation, a control should be analyzed, in pairs, in at least 12 trials.  Each analytical pair shall be analyzed under repeatable conditions and randomly inserted in a sample series.  Analyses will be duplicated on different days to reflect reasonable changes from one series to another.  Variations can have several causes: modification of the reactants composition, instrument re-calibration and even different operators.  After eliminating aberrant data using the Grubbs test, calculate the standard deviation to construct the Shewhart graphs.  This standard deviation is compared to that of the reference method.  If a published precision level is not obtained for the reference method, caused should be investigated.

The precision limits of the laboratory should be periodically revised by repeating the indicated procedure.

Once the Quality Control graph is constructed, graph the results obtained from each series for the control material.

A series is considered outside statistical control if:

(I)a value is outside the action limit,

(II)the current and previous values are situated outside the attention limits even in within the action limits,

III) nine successive values lie on the same side of the mean.

The laboratory response to "outside control" conditions is to reject the results for the series and perform tests to determine the cause, then take action to remedy the situation.

A Shewhart Control Graph can also be produced for the differences between analytical pairs in the same sample, especially when reference material does not exist.  In this case, the absolute difference between two analyses of the same sample is graphed.  The graph's lower line is 0 and the attention limit is 1.128 while the action limit is 3.686Sw where = the standard deviation of a series.

This type of graph only accounts for repeatability.  It should be no greater than the published repeatability limit for the method.

In the absence of control material, it sometimes becomes necessary to verify that the reproducibility limit of the reference method is not exceeded by comparing the results obtained to those of obtained by an experimental laboratory using the same sample.

Each laboratory performs two tests and the following formula is used:

= Critical difference (P=0,95)

= Means of 2 results obtained by lab 1

= Means of 2 results obtained by lab 2

R = Reproducibility of reference method

r = Repeatability of reference method

If the critical difference has been exceeded, the underlying reason is to be found and the test is to be repeated within one month.

  1. Evaluation of analytic results indicating that a legal limit has been exceeded

When analytical results indicated that a legal limit has been exceeded, the following procedure should be followed:

In the case of an individual result, conduct a second test under repeatable conditions.  If it is not possible to conduct a second test under repeatable conditions, conduct a double analysis under repeatable conditions and use these data to evaluate the critical difference.

Determine the absolute value of the difference between the mean of the results obtained under repeatable conditions and the legal limit.  An absolute value of the difference which is greater than the critical distance indicates that the sample does not fit the specifications.

Critical difference is calculated by the formula:Mean of results obtained

= Limit

n=Number of analyses

R=reproducibility

r=repeatability

In other words, this is a maximal limit where the average of the results obtained should not be greater than:

If the limit is a minimum, the average of the results obtained should not be less than:

  1. Comparing results obtained using two or more laboratories and comparing these results to a reference value

 

To determine whether or not data originating in two laboratories are in agreement, calculate the absolute difference between the two results and compare to the critical difference:= Mean of 2 results obtained by lab 1

=Mean of 2 results obtained by lab 2

= number of analyses in lab 1 sample

=number of analyses in lab 2 sample

R=Reproducibility of reference method

r=Repeatability of reference method

If the result is the average of two tests, the equation can be simplified to:

If the data are individual results, the critical difference is R.

If the critical difference is not exceeded, the conclusion is that the results of the two laboratories are in agreement.

Comparing results obtained by several laboratories with a reference value:

Suppose p laboratories have made n1 determinations, whose mean for each laboratory is y1 and whose total mean is:

The mean of all laboratories is compared with the reference value.  If the absolute difference exceeds the critical difference, as calculated using the following formula, we conclude the results are not in agreement with the reference value:

)

=Critical difference, calculated as indicated in point 2, for the reference method.

For example, the reference value can be the value assigned to a reference material or the

value obtained by the same laboratory or by a different laboratory with a different method.

 

  1. Evaluating analytical results obtained using non-valitated methods

A provisional reproducibility value can be assigned to a non-validated method by comparing it to that of a second laboratory:

= Mean of 2 results obtained by lab 1

= Mean of 2 results obtained by lab 2

r = Repeatability of reference method

Provisional reproducibility can be used to calculate critical difference.

If provisional reproducibility is less than twice the value of repeatability, it should be set to 2r.

A reproducibility value greater than three times repeatability or twice the value calculated using the Horwitz equation is not acceptable.

Horwitz equation:

%=Standard deviation for reproducibility(expressed as a percentage of the mean)

C= concentration, expressed as a decimal fraction (for example,   10g/100g = 0.1)

This equation was empirically obtained from more than 3000 collaborative studies including a diverse group of analyzed substances, matrices and measurement techniques.  In the absence of other information, RSDR values that are lower or equal to the RSDR values calculated using the Horwitz equation can be considered acceptable.

 values calculated by the Horwitz equation:

 

Concentration

%

10-9

45

10-8

32

10-7

23

10-6

16

10-5

11

10-4

8

10-3

5,6

10-2

4

10-1

2,8

1

2

If the result obtained using a non-validated method is close to the limit specified by legislation, the decision on the limit shall be decided as follows (for upper limits):

and, for lower limits,

S = decision limit

= legal limit

= provisional reproducibility for non-validated method

=reproducibility for reference method

= critical difference, calculated as indicated in point 2, for the reference method

The result which exceeds the decision limit should be replaced with a final result obtained using the reference method.

Critical differences for probability levels other than 95%

This difference can be determined by multiplying the critical differences at the 95% level by the coefficients shown in Table 1.

Table 1 - Multiplicative coefficients allowing

the calculation of critical differences for

probability levels other than 95%

Probability level P

Multiplicative coefficient

90

0,82

95

1,00

98

1,16

99

1,29

99,5

1,40

Shewhart control graph


 

Bibliography

  • "Harmonized Guidelines for Internal Quality Control in Analytical Chemistry Laboratories". IUPAC. Pure and App. Chem. Vol 67, nº 4, 649-666, 1995
  • "Shewhart Control Charts" ISO 8258. 1991.
  • "Precision of test methods - Determination of repeatability and reproducibility for a standard test method by inter-laboratory tests". ISO 5725, 1994.
  • "Draft Commission Regulation of establishing rules for the application of reference and routine methods for the analysis and quality evaluation of milk and milk products". Commission of the European Communities, 1995.
  • "Harmonized protocols for the adoption of standardized analytical methods and for the presentation of their performance characteristics". IUPAC. Pure an App. Chem., Vol. 62, nº 1, 149-162. 1990.

Protocol for the design, conducts and interpretation of collaborative studies

OIV-MA-AS1-09 Protocol for the designe, conducts and interpretation of collaborative studies

Introduction

After a number of meetings and workshops, a group of representatives from 27 organizations adopted by consensus a "Protocol for the design, conducts and interpretation of collaborative studies" which was published in Pure & Appl. Chem. 60, 855-864, 1995. A number of organizations have accepted and used this protocol. As a result of their experience and the recommendations of the Codex Committee on Methods of Analysis and Sampling (Joint FAO/WHO Food Standards Programme, Report of the Eighteenth Session, 9-13 November, 1992; FAO, Rome Italy, ALINORM 93/23, Sections 34-39), three minor revisions were recommended for incorporation into the original protocol. These are: (1) Delete the double split level design because the interaction term it generates depends upon the choice of levels and if it is statistically significant, the interaction cannot be physically interpreted. (2) Amplify the definition of "material". (3) Change the outlier removal criterion from 1% to 2.5%.

The revised protocol incorporating the changes is reproduced below. Some minor editorial revisions to improve readability have also been made. The vocabulary and definitions of the document 'Nomenclature of Interlaboratory Studies (Recommendations 1994)' [published in Pure Appl Chem., 66, 1903-1911 (1994)] has been incorporated into this revision, as well as utilizing, as far as possible, the appropriate terms of the International Organization for Standardization (ISO), modified to be applicable to analytical chemistry.

Protocol

  1. Preliminary work

Method-performance (collaborative) studies require considerable effort and should be conducted only on methods that have received adequate prior testing. Such within-laboratory testing should include, as applicable, information on the following:

1.1.  Preliminary estimates of precision

Estimates of the total within-laboratory standard deviation of the analytical results over the concentration range of interest as a minimum at the upper and lower limits

of the concentration range, with particular emphasis on any standard or specification value.

Note 1: The total within-laboratory standard deviation is a more inclusive measure of imprecision that the ISO repeatability standard deviation, §3.3 below. This standard deviation is the largest of the within-laboratory type precision variables to be expected from the performance of a method; it includes at least variability from different days and preferably from different calibration curves. It includes between-run (between-batch) as well as within-run (within-batch) variations. In this respect it can be considered as a measure of within-laboratory reproducibility. Unless this value is well within acceptable limits, it cannot be expected that the between-laboratory standard deviation (reproducibility standard deviation) will be any better. This precision term is not estimated from the minimum study described in this protocol.

NOTE 2: The total within-laboratory standard deviation may also be estimated from ruggedness trials that indicate how tightly controlled the experimental factors must be and what their permissible ranges are. These experimentally determined ranges should be incorporated into the description of the method.

1.2.  Systematic error (bias)

Estimates of the systematic error of the analytical results over the concentration range and in the substances of interest, as a minimum at the upper and lower limits of the concentration range, with particular emphasis on any standard or specification value.

The results obtained by applying the method to relevant reference materials should be noted.

1.3.  Recoveries

The recoveries of "spikes" added to real materials and to extracts, digests, or other treated solutions thereof.

1.4.  Applicability

The ability of the method to identify and measure the physical and chemical forms of the analyte likely to be present in the materials, with due regard to matrix effects.

1.5.  Interference

The effect of other constituents that are likely to be present at appreciable concentrations in matrices of interest and which may interfere in the determination.

1.6.  Method comparison

The results of comparison of the application of the method with existing tested methods intended for similar purposes.

1.7.  Calibration Procedures

The procedures specified for calibration and for blank correction must not introduce important bias into the results.

1.8.  Method description

The method must be clearly and unambiguously written.

1.9.  Significant figures

The initiating laboratory should indicate the number of significant figures to be reported, based on the output of the measuring instrument.

Note: In making statistical calculations from the reported data, the full power of the calculator or computer is to be used with no rounding or truncating until the final reported mean and standard deviations are achieved. At this point the standard deviations are rounded to 2 significant figures and the means and related standard deviations are rounded to accommodate the significant figures of the standard deviation. For example, if = 0.012, c is reported as 0.147, not as 0. 1473 or 0. 15, and RSDR is reported as 8.2%. (Symbols are defined in Appendix L) If standard deviation calculations must be conducted manually in steps, with the transfer of intermediate results, the number of significant figures to be retained for squared numbers should be at least 2 times the number of figures in the data plus 1.

  1. Design of the method-performance study

 

2.1.  Number of materials

For a single type of substance, at least 5 materials (test samples) must be used; only when a single level specification is involved for a single matrix may this minimum required number of materials to be reduced to 3. For this design parameter, the two portions of a split level and the two individual portions of blind replicates per laboratory are considered as a single material.

Note 1: A material is an 'analyte/matrix/concentration' combination to which the method-performance parameters apply. This parameter determines the applicability of a method. For application to a number of different substances, a sufficient number of matrices and levels should be chosen to include potential interferences and the concentration of typical use.

Note 2: The 2 or more test samples of blind or open replicates statistically, are a single material (they are not independent).

NOTE 3: A single split level (Youden pair) statistically analyzed as a pair is a single material; if analyzed statistically and reported as single test samples, they are 2 materials. In addition, the pair can be used to calculate the within-laboratory standard deviation, as

(for duplicates, blind or open

(for duplicates, blind or open

where , the difference between the 2 individual values from the split level for each laboratory and n is the number of laboratories. In this special case, , the among laboratories standard deviation, is merely the average of the two values calculated from the individual components of the split level, and it is used only as a check of the calculations.

Note 4: The blank or negative control may be a material or not depending on the usual purpose of the analysis. For example, in trace analysis, where very low levels (near the limit of quantitation) are often sought, the blanks are considered as materials and are necessary to determine certain 'limits of measurement.' However, if the blank is merely a procedural control in macro analysis (e.g., fat in cheese), it would not be considered a material.

2.2.  Number of laboratories

At least 8 laboratories must report results for each material; only when it is impossible to obtain this number (e.g., very expensive instrumentation or specialized laboratories required) may the study be conducted with less, but with an absolute minimum of 5 laboratories. If the study is intended for international use, laboratories from different countries should participate. In the case of methods requiring the use of specialized instruments, the study might include the entire population of available laboratories. In such cases, "n" is used in the denominator for calculating the standard deviation instead of "(n - 1)". Subsequent entrants to the field should demonstrate the ability to perform as well as the original participant.

2.3.  Number of Replicates

The repeatability precision parameters must be estimated by using one of the following sets of designs (listed in approximate order of desirability):

2.3.1.      Split Level

For each level that is split and which constitutes only a single material for purposes of design and statistical analysis, use 2 nearly identical test samples that differ only slightly in analyte concentration (e.g., <1-5%). Each laboratory must analyse each test sample once and only once.

Note: The statistical criterion that must be met for a pair of test samples to constitute a split level is that the reproducibility standard deviation of the two parts of the single split level must be equal.

2.3.2.      Combination blind replicates and split level

Use split levels for some materials and blind replicates for other materials in the same study (single values from each submitted test sample).

2.3.3.      Blind replicates

For each material, use blind identical replicates, when data censoring is impossible (e.g., automatic input, calculation, and printout) non-blind identical replicates may be used.

2.3.4.      Known replicates

For each material, use known replicates (2 or more analyses of test portions from the same test sample), but only when it is not practical to use one of the preceding designs.

2.3.5.      Independent analyses

Use only a single test portion from each material (i.e., do not perform multiple analyses) in the study, but rectify the inability to calculate repeatability parameters by quality control parameters or other within-laboratory data obtained independently of the method-performance study.

  1. Statistical analysis (See Flowchart, A.4. 1)

For the statistical analysis of the data, the required statistical procedures listed below must be performed and the results reported. Supplemental, additional procedures are not precluded.

3.1.  Valid data

Only valid data should be reported and subjected to statistical treatment. Valid data are those data that would be reported as resulting from the normal performance of laboratory analyses; they are not marred by method deviations, instrument malfunctions, unexpected occurrences during performance, or by clerical, typographical and arithmetical errors.

3.2.  One-way analysis of variance

One-way analysis of variance and outlier treatments must be applied separately to each material (test sample) to estimate the components of variance and repeatability and reproducibility parameters.

3.3.  Initial estimation

Calculate the mean, c (= the average of laboratory averages), repeatability relative standard deviation, and reproducibility relative standard deviation, RSDR with no outliers removed, but using only valid data.

3.4.  Outlier treatment

The estimated precision parameters that must also be reported are based on the initial valid data purged of all outliers flagged by the harmonized 1994 outlier removal procedure. This procedure essentially consists of sequential application of the Cochran and Grubbs tests (at 2.5% probability (P) level, 1-tail for Cochran and 2-tail for Grubbs) until no further outliers are flagged or until a drop of 22.2% (= 219) in the original number of laboratories providing valid data would occur.

Note: Prompt consultation with a laboratory reporting suspect values may result in correction of mistakes or discovering conditions that lead to invalid data, 3.1.

Recognizing mistakes and invalid data per se is much preferred to relying upon statistical tests to remove deviate values.

3.4.1.      Cochran test

First apply Cochran outlier test (1-tail test a P = 2.5%) and remove any laboratory whose critical value exceeds the tabular value given in the tale, Appendix A.3. 1, for the number of laboratories and replicates involved.

3.4.2.      Grubbs tests

Apply the single value Grubbs test (2 tail) and remove any outlying laboratory. If no laboratory is flagged, then apply the pair value tests (2 tail) - 2 at the same end and 1 value at each end, P = 2.5% overall. Remove any laboratory(ies) flagged by these tests whose critical value exceeds the tabular value given in the appropriate column of the table Appendix A.3.3. Stop removal when the next application of the test will flag as table, A outliers more that 22.2% (2 of 9) of the laboratories.

Note: The Grubbs tests are to be applied one material at a time to the set of replicate means from all laboratories, and not to the individual values from replicated designs because the distribution of all the values taken together is multimodal, not Caussian, i.e., their differences from the overall mean for that material are not independent.

3.4.3.      Final estimation

Recalculate the parameters as in §3.3 after the laboratories flagged by the preceding procedure have been removed. If no outliers were removed by the Cochran-Grubbs sequence, terminate testing. Otherwise, reapply the Cochran-Grubbs sequence to the data purged of the flagged outliers until no further outliers are flagged or until more than a total of 22.2% (2 of 9 laboratories) would be removed in the next cycle. See flowchart A.3.4.

  1. Final report

The final report should be published and should include all valid data. Other information and parameters should be reported in a format similar (with respect to the reported items) to the following, as applicable:

[x] Method-performance tests carried out at the international level in [year(s)] by [organisation] in which [y and z] laboratories participated, each performing [k] replicates, gave the following statistical results:

Table of method -Performance parameters

Analyte; Results expressed in [units]

Material [Description and listed in columns across top of table in increasing order of magnitude of means]

Number of laboratories retained after eliminating outliers

Number of outlying laboratories

Code (or designation) of outlying laboratories

Number of accepted results

Mean

True or accepted value, if known

Repeatability standard deviation (Sr)

Repeatability relative standard deviation (RSDR)

Repeatability limit, r (2.8 x Sr)

Reproducibility standard deviation (SR)

Reproducibility relative standard deviation (RSDR)

Reproducibility limit, R (2.8 X SR)

4.1.  Symbols

A set of symbols for use in reports and publications is attached as Appendix 1 (A.1.).

4.2.  Definitions

A set of definitions for use in study reports and publications is attached as Appendix 2 (A.2.).

4.3.  Miscellaneous

4.3.1.      Recovery

Recovery of added analyte as a control on method or laboratory bias should be calculated as follows:

[Marginal] Recovery, %=

(Total analyte found - analyte originally present) x 100/(analyte added)

Although the analyte may be expressed as either concentration or amount, the units must be the same throughout. When the quantity of analyte is determined by analysis, it must be determined in the same way throughout.

Analytical results should be reported uncorrected for recovery. Report recoveries separately.

4.3.2.      When , is negative

By definition, is greater than or equal to   in method-performance studies; occasionally the estimate of is greater than the estimate of (the average of the replicates is greater than the range of laboratory averages and the calculated is then negative). When this occurs, set = 0 and = .

  1. References
  • Horwitz, W. (1988) Protocol for the design, conduct, and interpretation of method performance studies. Pure & Appl. Chem. 60, 855-864.
  • Pocklington, W.D. (1990) Harmonized protocol for the adoption of standardized analytical methods and for the presentation of their performance characteristics. Pure and Appl. Chem. 62, 149-162.
  • International Organization for Standardization. International Standard 5725-1986. Under revision in 6 parts; individual parts may be available from National Standards member bodies.

Appendices

Appendix 1. - Symbols

Use the following set of symbols and terms for designating parameters developed by a method-performance study.

Mean (of laboratory averages): x

Standard deviations:s (estimates)

  • Repeatability:
  • 'Pure' between-laboratory:
  • Reproducibility;

Variances: (with subscripts, r, L, and R)

Relative standard deviations: RSD (with subscripts, r, L, and r)

Maximum tolerable differences

(as defined by ISO 5725-1986);

See A.2.4 and A.2.5)

Repeatability limitr = (2.8 x )

Reproducibility limit R = (2.8 X )

Number of replicates per laboratory :k (general)

Average number of replicates per laboratory i:k (for a balanced design)

Number of laboratories :L

Number of materials (test samples): m

Total number of values in a given assay: n (= kL for a balanced design)

Total number of values in a given study: N (= kLm for an overall balanced design)

____________________

If other symbols are used, their relationship to the recommended symbols should be explained fully.

Appendix 2. -  Definitions

Use the following definitions. The first three definitions utilize the 1UPAC document "Nomenclature of Interlaboratory Studies" (approved for publication 1994). The next two definitions are assembled from components given in ISO 3534-1:1993. All test results are assumed to be independent, i.e., 'obtained in a manner not influenced by any previous result on the same or similar test object. Quantitative measures of precision depend critically on the stipulated conditions. Repeatability and reproducibility conditions are particular sets of extreme stipulated conditions.'

  1. A.2.1 Method-performance studies

An interlaboratory study in which all laboratories follow the same written protocol and use the same test method to measure a quantity in sets of identical test items [test samples, materials]. The reported results are used to estimate the performance characteristics of the method. Usually these characteristics are within-laboratory and among-laboratories precision, and when necessary and possible, other pertinent characteristics such as systematic error, recovery, internal quality control parameters, sensitivity, limit of determination, and applicability.

  1. A.2.2 Laboratory-performance study

An interlaboratory study that consists of one or more analyses or measurements by a group of laboratories on one or more homogeneous, stable test items, by the method selected or used by each laboratory. The reported results are compared with those of other laboratories or with the known or assigned reference value, usually with the objective of evaluating or improving laboratory performance.

  1. A.2.3 Material certification stud

An interlaboratory study that assigns a reference value ('true value') to a quantity (concentration or property) in the test item, usually with a stated uncertainty.

  1. A.2.4  Repeatability limit (r)

When the mean of the values obtained from two single determinations with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time, lies within the range of the mean values cited in the Final Report, 4.0, the absolute difference between the two test results obtained should be less than or equal to the repeatability limit (r) [= 2.8 x s,) that can generally be inferred by linear interpolation of from the Report.

Note: This definition, and the corresponding definition for reproducibility limit, has been assembled from five cascading terms and expanded to permit application by interpolation to a test item whose mean is not the same as that used to establish the original parameters, which is the usual case in applying these definitions. The term 'repeatability [and reproducibility] limit' is applied specifically to a probability of 95% and is taken as 2.8 x s, [or SRI. The general term for this statistical concept applied to any measure of location (e.g., median) and with other probabilities (e.g., 99%) is "repeatability [and reproducibility] critical difference".

  1. A.2.5  Reproducibility limit (R)

When the mean of the values obtained from two single determinations with the same method on identical test items in different laboratories with different operators using different equipment, lies within the range of the mean values cited in the Final Report, 4.0, the absolute difference between the two test results obtained should be less than or equal to the reproducibility limit (R) [= 2.8 x ] that can generally be inferred by linear interpolation of from the Report.

Note 1: When the results of the interlaboratory test make it possible, the value of r and R can be indicated as a relative value (e.g., as a percentage of the determined mean value) as an alternative to the absolute value.

Note 2: When the final reported result in the study is an average derived from more than a single value, i.e., k is greater than 1, the value for R must be adjusted according to the following formula before using R to compare the results of a single routine analyses between two laboratories.

Similar adjustments must be made for replicate results constituting the final values for and , if these will be the reported parameters used for quality control purposes.

Note 3: The repeatability limit, r, may be interpreted as the amount within which two determinations should agree with each other within a laboratory 95% of the time. The reproducibility limit, R, may be interpreted as the amount within which two separate determinations conducted in different laboratories should agree with each other 95% of the time.

Note 4: Estimates Of can be obtained only from a planned, organized method performance study; estimates of can be obtained from routine work within a laboratory by use of control charts. For occasional analyses, in the absence of control charts, within-laboratory precision may be approximated as one half SR (Pure and Appl. Chem., 62, 149-162 (1990) , Sec. L3, Note.).

  1. A.2.6 One-way analysis of variance

One-way analysis of variance is the statistical procedure for obtaining the estimates of within laboratory and between-laboratory variability on a material-by-material basis. Examples of the calculations for the single level and single-split-level designs can be found in ISO 5725-1986.

Appendix 3. – Critical values

  1. A.3.1 Critical values for the Cochran maximum variance ratio at the 2.5% (1 -tail) rejection level, expressed as the percentage the highest variance is of the total variance; r = number of replicates.

No of labs

r=2

r=3

r=4

r=5

=6

4

94.3

81.0

72.5

65.4

62.5

5

88.6

72.6

64.6

58.1

53.9

6

83.2

65.8

58.3

52.2

47.3

7

78.2

60.2

52.2

47.3

42.3

8

73.6

55.6

47.4

43.0

38.5

9

69.3

51.8

43.3

39.3

35.3

10

65.5

48.6

39.9

36.2

32.6

11

62.2

45.8

37.2

33.6

30.3

12

59.2

43.1

35.0

31.3

28.3

13

56.4

40.5

33.2

29.2

26.5

14

53.8

38.3

31.5

27.3

25.0

15

51.5

36.4

29.9

25.7

23.7

16

49.5

34.7

28.4

24.4

22.0

17

47.8

33.2

27.1

23.3

21.2

18

46.0

31.8

25.9

22.4

20.4

19

44.3

30.5

24.8

21.5

19.5

20

42.8

29.3

23.8

20.7

18.7

21

41.5

28.2

22.9

19.9

18.0

22

40.3

27.2

22.0

19.2

17.3

23

39.1

26.3

21.2

18.5

16.6

24

37.9

25.5

20.5

17.8

16.0

25

36.7

24.8

19.9

17.2

15.5

26

35.5

24.1

19.3

16.6

15.0

27

34.5

23.4

18.7

16.1

14.5

28

33.7

22.7

18.1

15.7

14.1

29

33.1

22.1

17.5

15.3

13.7

30

32.5

21.6

16.9

14.9

13.3

35

29.3

19.5

15.3

12.9

11.6

40

26.0

17.0

13.5

11.6

10.2

50

21.6

14.3

11.4

9.7

8.6

Tables A.3.1 and A.3.3 were calculated by R. Albert (October, 1993) by computer simulation involving several runs of approximately 7000 cycles each for each value, and then smoothed. Although Table A.3.1 is strictly applicable only to a balanced design (same number of replicates from all laboratories), it can be applied to an unbalanced design without too much error, if there are only a few deviations.

  1. A.3.2 Calculation of Cochran maximum variance outlier ratio

Compute the within-laboratory variance for each laboratory and divide the largest of these variances by the sum of the all of the variances and multiply by 100. The resulting quotient is the Cochran statistic which indicates the presence of a removable outlier if this quotient exceed the critical value listed above in the Cochran table for the number of replicates and laboratories specified.

  1. A.3.3 Critical values for the Grubbs extreme deviation outlier tests at the 2.5% (2-tail), 1.25% (1tail) rejection level, expressed as the percent reduction in standard deviations caused by the removal of the suspect value(s).

No. of labs

One highest

or lowest

Two highest

or two lowest

One highest and

one lowest

4

86.1

98.9

99.1

5

73.5

90.9

92.7

6

64.0

81.3

84.0

7

57.0

73.1

76.2

8

51.4

66.5

69.6

9

46.8

61.0

64.1

10

42.8

56.4

59.5

11

39.3

52.5

55.5

12

36.3

49.1

52.1

13

33.8

46.1

49.1

14

31.7

43.5

46.5

15

29.9

41.2

44.1

16

28.3

39.2

42.0

17

26.9

37.4

40.1

18

25.7

35.9

38.4

19

24.6

34.5

36.9

20

23.6

33.2

35.4

21

22.7

31.9

34.0

22

21.9

30.7

32.8

23

21.2

29.7

31.8

24

20.5

28.8

30.8

25

19.8

28.0

29.8

26

19.1

27.1

28.9

27

18.4

26.2

28.1

28

17.8

25.4

27.3

29

17.4

24.7

26.6

30

17.1

24.1

26.0

40

13.3

19.1

20.5

50

11.1

16.2

17.3

  1. A.3.4 Calculation of the Grubbs test values

To calculate the single Chubbs test statistic, compute the average for each laboratory and then calculate the standard deviation (M) of these L averages (designate as the original s). Calculate the SD of the set of averages with the highest average removed (SH); calculate the SD of the set of averages with the lowest average removed (SL). The calculate the percentage decrease in SD for both as follows:

  • 100 x [ 1 - (sL/s] and 100 x [ 1 - (sH/s)].

The higher of these two percentage decreases is the singe Grubbs test statistic, which signal the presence of an outlier to be omitted at the P = 2.5% level, 2tail, if it exceeds the critical value listed in the single value column, Column 2, of Table A.3.3 , for the number of laboratory averages used to calculate the original s.

To calculate the paired Grubbs test statistics, calculate the percentage decrease in standard deviation obtained by dropping the two highest averages and also by dropping the two lowest averages, as above. Compare the higher of the percentage changes in standard deviation with the tabular values in column 3 and proceed with (1) or (2): (1) If the tabular value is exceeded, remove the responsible pair. Repeat the cycle again, starting at the beginning with the Cochran extreme variance test again, the Grubbs extreme value test, and the paired Grubbs extreme value test. (2) If no further values are removed, then calculate the percentage change in standard deviation obtained by dropping both the highest extreme value and the lowest extreme value together, and compare with the tabular values in the last column of A.3.3. If the tabular value is exceeded, remove the high-low pair of averages, and start the cycle again with the Cochran test until no further values are removed. In all cases, stop outlier testing when more than 22.2% (2/9) of the averages are removed.

Appendix 4

  1. A.4.1. Flowchart for outlier removal

Estimation of the detection and quantification limits of a method of analysis

OIV-MA-AS1-10 Estimation of the detection and quantification limits of a method of analysis

  1. Purpose: to establish the detection and quantification limits of a method

N.B. : The proposed calculation procedure sets « detection and quantification limiting » values with respect to the instrumental response. For a given method, the final calculation of these values must take cognizance of factors arising from the preparation of the sample.

 

  1. Definitions

 

Detection limit: the smallest concentration or proportion of the analyzed substance that can be detected with an acceptable level of uncertainty, but that is not quantified under the experimental conditions described in the method

Quantification limit: the smallest concentration or proportion of the analyzed substance  that can be quantified with an acceptable level of uncertainty, under the experimental conditions described in the method.

 

  1. Logic Diagram for Decision-Making

 

 

  1. Methodology

 

4.1.  "Results" approach

colorimetry), the detection limit (LD) and the quantification limit (LQ) are estimated using one of the two following methods.

4.1.1.      Method 1:

Directly read n measurements (analyte quantity or response) of separate analytic  « blank » samples that contain all of the constituents, with the exception of  the substance to be tested for.

and

where and are the mean and standard deviation for n measurements.

Note: A multiplication factor of 3 corresponds to a 0.13% chance of concluding that the substance sought is present, when, in fact, it is lacking.  10 corresponds to a 0.5% chance.

4.1.2.      Method 2:

Using the straight calibration line: Y = a + bX

The detection limit is the smallest concentration of a substance that can be distinguished from the blank, with a 0.13% risk of retaining samples containing nothing ; in other words, the value  beginning at which a statistical test comparing the response to  0 becomes significant with an error level α of 0.13%.  Hence:

Where Sa is the standard deviation on the ordinate at the origin of the straight regression line.  The logic is the same for LQ,, where the multiplication factor is 10 (risk of 0.5%).

4.2.  "Graph" Approach

For analytical methods which generate graphs (i.e., chromatography), the detection limit is estimated based on the ground noise of the analytic blank recording for a given sample.

= 3 x h x R (associated risk is below 0.13%) and

= 10 x h x R (associated risk is below 0.5%), where

h is the average or maximum amplitude of the signal window corresponding to 10 width s of the mid-height peak on either side of the retention time, as a function of stability.

R is the quantity/signal response factor expressed as a function of  the quantity of substance/height.

On each occasion, three series of three injections each are performed on test blanks at an interval of several days.

4.2.1.      method

Increase ground noise to the maximum (Fig. 1 above) ;

center around the retention time (RT) of the product ;

draw a window of 10 widths of the mid-height peak (W1/2) on either side of the RT ;

draw two parallel lines, one running through the highest point of the highest peak, the other through the base of the deepest trough ;

evaluate height -> hmax ;

calculate the response factor (R factor) ;

= 3 x x R

= 10 x x R

4.2.2.      Method

increase the ground noise to the maximum (Fig. 2 above);

center around the retention time (RT) of the product;

draw a window of 10 widths of the mid-height peck (W1/2) on either side of the RT;

divide into 20 equal sections (x);

draw two parallel lines in each block, one running through the highest point of  the highest peak, the other through the base of the deepest trough ;

measure the heights, y;

calculate the average (y = );

calculate the response factor (R factor);

= 3 x x R;

= 10 x x R

These estimates can themselves be validated by injecting quantities of solute  that are close to the calculated limits (Figures 3 and 4).

Compound at [c]  #

Figure No. 3: Validating calculations of limits.

Concentration of the compound approaches

N.B. : The dotted line corresponds to the real injected value  however, since this figure is provided as an example, it may be deleted from the final text

Compound at < [c] <

Figure No. 4: Validating calculations of limits.

Concentration of compound between and

N.B. : The dotted line corresponds to the real injected value; however, since this figure is provided as an example, it may be deleted from the final text.

Harmonized guidelines for internal quality control in analytical chemistry laboratories

OIV-MA-AS1-11 Harmonized guidelines for internal quality control in analytical chemistry laboratories

Contents

  1. Introduction
    1.   Basic concepts
    2.   Scope of this document
    3.   Internal quality control and uncertainties
  2. Definitions
    1.   International definitions
    2.   Definition of terms specific to this document
  3. Quality assurance practices and internal quality control
    1.   Quality assurance
    2.   Choice of analytical method
    3.   Quality control and aptitude tests
  4. Internal quality control procedures
    1.   Introduction
    2.   General approach. Statistical control
    3.   Internal quality control and fitness for purpose
    4.   The nature of errors
  1. IQC and withing run precision
    1.   Precision and duplication
    2.   Interpretation of duplicate data
  2. Control materials in IQC
    1.   Introduction
    2.   The role of certified reference materials
    3.   Preparation of control material
    4.   Blank determinations
    5.   Traceability in spiking and recovery checks
  3. Recommendations
  4. Conclusions
  5. References

Appendix 1. Shewart control charts

 

  1. Introduction

1.1.  Basic concept

This document sets out guidelines for the implementation of internal quality control (IQC) in analytical laboratories. IQC is one of a number of concerted measures that analytical chemists can take to ensure that the data produced in the laboratory are fit for their intended purpose. In practice, fitness for purpose is determined by a comparison of the accuracy achieved in a laboratory at a given time with a required level of accuracy.  Internal quality control therefore comprises the routine practical procedures that enable the analytical chemist to accept a result or group of results as fit for purpose, or reject the results and repeat the analysis.  As such, IQC is an important determinant of the quality of analytical data, and is recognised as such by accreditation agencies.

Internal quality control is undertaken by the inclusion of particular reference materials, here called "control materials", into the analytical sequence and by duplicate analysis.  The control materials should, wherever possible, be representative of the test materials under consideration in respect of matrix composition, the state of physical preparation and the concentration range of the analyte.  As the control materials are treated in exactly the same way as the test materials, they are regarded as surrogates that can be used to characterise the performance of the analytical system, both at a specific time and over longer intervals.

Internal quality control is a final check of the correct execution of all of the procedures (including calibration) that are prescribed in the analytical protocol and all of the other quality assurance measures that underlie good analytical practice. IQC is therefore necessarily retrospective.  It is also required to be as far as possible independent of the analytical protocol, especially the calibration, that it is designed to test.

Ideally both the control materials and those used to create the calibration should be traceable to appropriate certified reference materials or a recognised empirical reference method. When this is not possible, control materials should be traceable at least to a material of guaranteed purity or other well characterised material. However, the two paths of traceability must not become coincident at too late a stage in the analytical process.  For instance, if control materials and calibration standards were prepared from a single stock solution of analyte, IQC would not detect any inaccuracy stemming from the incorrect preparation of the stock solution.

In a typical analytical situation several, or perhaps many, similar test materials will be analysed together, and control materials will be included in the group.  Often determinations will be duplicated by the analysis of separate test portions of the same material.  Such a group of materials is referred to in this document as an analytical "run".  (The words "set", "series" and "batch" have also been used as synonyms for "run".)  Runs are regarded as being analysed under effectively constant conditions.  The batches of reagents, the instrument settings, the analyst, and the laboratory environment will, under ideal conditions, remain unchanged during analysis of a run.  Systematic errors should therefore remain constant during a run, as should the values of the parameters that describe random errors. .As the monitoring of these errors is of concern, the run is the basic operational unit of IQC.

A run is therefore regarded as being carried out under repeatability conditions, i.e., the random measurement errors are of a magnitude that would be encountered in a "short" period of time.  In practice the analysis of a run may occupy sufficient time for small systematic changes to occur.  For example, reagents may degrade, instruments may drift, minor adjustments to instrumental settings may be called for, or the laboratory temperature may rise.  However, these systematic effects are, for the purposes of IQC, subsumed into the repeatability variations.  Sorting the materials making up a run into a randomised order converts the effects of drift into random errors.

1.2.  Scope of this document

This document is a harmonisation of IQC procedures that have evolved in various fields of analysis, notably clinical biochemistry, geochemistry and environmental studies, occupational hygiene and food analysis(3-9). There is much common ground in the procedures from these various fields. Analytical chemistry comprises an even wider range of activities and the basic principles of IQC should be able to encompass all of these. The present document provides guidelines that will be applicable in most instances. This policy necessarily excludes a number of IQC practices that are restricted to individual sectors of the analytical community. In addition in some sectors it is common to combine IQC as defined here with other aspects of quality assurance practice. There is no harm in such combination, but it must remain clear what are the essential aspects of IQC.

In order to achieve a harmonisation and provide basic guidance on IQC, some types of analytical activity have been excluded from this document.  Issues specifically excluded are as follows.

(i)     Quality control of sampling.  While it is recognised that the quality of the analytical result can be no better than that of the sample, quality control of sampling is a separate subject and in many areas is not fully developed.  Moreover, in many instances analytical laboratories have no control over sampling practice and quality.

(ii)    In-line analysis and continuous monitoring.  In this style of analysis there is no possibly of repeating the measurement, so the concept of IQC as used in this document is inapplicable.

(iii)  Multivariate IQC.  Multivariate methods in IQC are still the subject of research and cannot be regarded as sufficiently established for inclusion here. The current document regards multianalyte data as requiring a series of univariante IQC tests. Caution is necessary in the interpretation of this type of data to avoid inappropriately frequent rejection of data.

(iv)  Statutory and contractual requirements.

(v)    Quality assurance measures such as checks on instrumental stability before and during analysis, wavelength calibration, balance calibration, tests on resolution of chromatography columns, and problem diagnostics are not included. For present purposes they are regarded as part of the analytical protocol, and IQC tests their effectiveness together with the other aspects of the methodology.

 

1.3.  Internal quality control and uncertainty

A prerequisite of analytical chemistry is the recognition of "fitness for purpose", the standard of accuracy that is required for an effective use of the analytical data. This standard is arrived at by consideration of the intended uses of the data although it is seldom possible to foresee all of the potential future applications of analytical results. For this reason in order to prevent inappropriate interpretation, it is important that a statement of the uncertainty should accompany analytical results, or be readily available to those who wish to use the data.

Strictly speaking, an analytical result cannot be interpreted unless it is accompanied by knowledge of its associated uncertainty at a stated level of confidence.  A simple example demonstrates this principle.  Suppose that there is a statutory requirement that a foodstuff must not contain more than 10 μg g-1 of a particular constituent. A manufacturer analyses a batch and obtains a result of 9 μg g-1 for that constituent.  If the uncertainty of the result expressed as a half range (assuming no sampling error) is

0.1 μg g-1 (i.e. the true result falls, with a high probability, within the range 8.9‑9.1) then it may be assumed that the legal limit is not exceeded.  If, in contrast, the uncertainty is 2 μg g-1 then there is no such assurance.  The interpretation and use that may be made of the measurement thus depends on the uncertainty associated with it.

Analytical results should therefore have an associated uncertainty if any definite meaning is to be attached to them or an informed interpretation made.  If this requirement cannot be fulfilled, the use to which the data can be put is limited.  Moreover, the achievement of the required measurement uncertainty must be tested as a routine procedure, because the quality of data can vary, both in time within a single laboratory and between different laboratories.  IQC comprises the process of checking that the required uncertainty is achieved in a run.

  1. Definitions

2.1.  International definitions

Quality assurance. All those planned and systematic actions necessary to provide adequate confidence that a product or service will satisfy given requirements for quality(10).

Trueness: closeness of the agreement between the average value obtained from a large series of test results and an accepted reference value(11).

Precision: closeness of agreement between independent test results obtained under prescribed conditions(12).

Bias: difference between the expectation of the test results and an accepted reference value(11).

Accuracy: closeness of the agreement between the result of a measurement and a true value of the measurand(13).

Note 1.  Accuracy is a qualitative concept.

Note 2.  The term precision should not be used for accuracy.

Error: result of a measurement minus a true value of the measurand(13).

Repeatability conditions. conditions where independent test results are obtained with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time(11).

Uncertainty of measurement:  parameter, associated with the result of a measurement, that characterises the dispersion of the values that could reasonably be attributed to the measurand(14).

Note 1. The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an interval having a stated level of confidence.

Note 2. Uncertainty of measurement comprises, in general, many components.  Some of these components may be evaluated from the statistical distribution of results of a series of measurements and can be characterised by experimental standard deviations.  The other components, which can also be characterised by standard deviations, are evaluated from assumed probability distributions based on experience or other information.

Note 3. It is understood that the result of a measurement is the best estimate of the value of a measurand, and that all components of uncertainty, including those arising from systematic effects, such as components associated with corrections and reference standards, contribute to the dispersion.

Traceability: property of the result of a measurement or the value of a standard whereby it can be related to stated references, usually national or international standards, through an unbroken chain of comparisons all having stated uncertainties(13).

Reference material: material or substance one of whose property values are sufficiently homogeneous and well established to be used for the calibration of an apparatus, the assessment of a measurement method, or for assigning values to materials(13).

Certified reference material:  reference material, accompanied by a certificate, one or more of whose property values are certified by a procedure which establishes its traceability to an accurate realisation of the unit in which the property values are expressed, and for which each certified value is accompanied by an uncertainty at a stated level of confidence(13).

 

2.2.  Definitions of  terms specific to this document

Internal quality control: set of procedures undertaken by laboratory staff for the continuous monitoring of operation and the results of measurements in order to decide whether results are reliable enough to be released.

Control material:  material used for the purposes of internal quality control and subjected to the same or part of the same measurement procedure as that used for test materials.

Run (analytical run):  set of measurements performed under repeatability conditions.

Fitness for purpose:  degree to which data produced by a measurement process enables a user to make technically and administratively correct decisions for a stated purpose

Analytical system: range of circumstances that contribute to the quality of analytical data, including equipment, reagents, procedures, test materials, personnel, environment and quality assurance measures.

 

  1. Quality assurance practices and internal quality control

 

3.1.  Quality assurance

Quality assurance is the essential organisational infrastructure that underlies all reliable analytical measurements. It is concerned with achieving appropriate levels in matters such as staff training and management, adequacy of the laboratory environment, safety, the storage, integrity and identity of samples, record keeping, the maintenance and calibration of instruments, and the use of technically validated and properly documented methods . Failure in any of these areas might undermine vigorous efforts elsewhere to achieve the desired quality of data.  In recent years these practices have been codified and formally recognised as essential.  However, the prevalence of these favourable circumstances by no means ensures the attainment of appropriate data quality unless IQC is conducted.

 

3.2.  Choice of analytical method

It is important that laboratories restrict their choice of methods to those that have been characterised as suitable for the matrix and analyte of interest.  The laboratory must possess documentation describing the performance characteristics of the method, estimated under appropriate conditions.

The use of a method does not in itself guarantee the achievement of its established performance characteristics. There is, for a given method, only the potential to achieve a certain standard of reliability when the method is applied under a particular set of circumstances.  It is this collection of circumstances, known as the "analytical system",  that is therefore responsible for the accuracy of analytical data. Hence it is important to monitor the analytical system in order to achieve fitness for purpose. This is the aim of the IQC measures undertaken in a laboratory.

 

3.3.  Internal quality control and proficiency tests

Proficiency testing is a periodic assessment of the performance of individual laboratories and groups of laboratories that is achieved by the distribution by an independent testing body of typical materials for unsupervised analysis by the participants(2).  Although important, participation in proficiency testing schemes is not a substitute for IQC measures, or vice versa.

Proficiency testing schemes can be regarded as a routine, but relatively infrequent, check on analytical errors.  Without the support of a well‑developed IQC system, the value of participation in a proficiency test is negligible.  Probably the main beneficial effect of proficiency tests is that of encouraging participants to install effective quality control systems.  It has been shown that laboratories with effective IQC systems performed better in a proficiency testing scheme(15).

  1. Internal quality control procedures

 

4.1.  Introduction

Internal quality control involves the practical steps undertaken to ensure that errors in analytical data are of a magnitude appropriate for the use to which the data will be put.  The practice of IQC depends on the use of  two strategies, the analysis of reference materials to monitor trueness and statistical control, and duplication to monitor precision.

The basic approach to IQC involves the analysis of control materials alongside the test materials under examination.  The outcome of the control analyses forms the basis of a decision regarding the acceptability of the test data.  Two key points are worth noting in this context.

(i)     The interpretation of control data must be based on documented, objective criteria, and on statistical principles wherever possible.

(ii)    The results of control analyses should be viewed primarily as indicators of the performance of the analytical system, and only secondarily as a guide to the errors associated with individual test results.  Substantial changes in the apparent accuracy of control determinations can sometimes               be taken to imply similar changes to data for contemporary test materials, but correction of analytical data on the basis of this premise is unacceptable.

4.2.  General Approach ‑ Statistical Control

The interpretation of the results of IQC analyses depends largely on the concept of statistical control, which corresponds with stability of operation. Statistical control implies that an IQC result x can be interpreted as arising independently and at random from a normal population with mean  and variance .

Under these constraints only about 0.27% of results (x) would fall outside the bounds of . When such extreme results are encountered they are regarded as being "out‑of‑ control" and interpreted to mean that the analytical system has started to behave differently.  Loss of control therefore implies that the data produced by the system are of unknown accuracy and hence cannot be relied upon.  The analytical system therefore requires investigation and remedial action before further analysis is undertaken. Compliance with statistical control can be monitored graphically with Shewhart control charts (see Appendix 1). An equivalent numerical approach, comparing values of z = (x- μ)/ σ against appropriate values of the standard normal deviate, is also possible.

 

4.3.  Internal quality control and fitness for purpose.

For the most part, the process of IQC is based on a description in terms of the statistical parameters of an ongoing analytical system in normal operation.  Control limits are therefore based on the estimated values of these parameters rather than measures derived from considerations of fitness for purpose.  Control limits must be narrower than the requirements of fitness for purpose or the analysis would be futile.

The concept of statistical control is inappropriate, however, when the so-called ad hoc analysis is being undertaken.  In ad hoc analysis the test materials may be unfamiliar or rarely encountered, and runs are often made up of only a few such test materials.  Under these circumstances there is no statistical basis for the construction of control charts.  In such an instance the analytical chemist has to use fitness for purpose criteria, historical data or consistency with the visual properties of the test material for judging the acceptability of the results obtained.

Either way, agreed methods of establishing quantitative criteria to characterise fitness for purpose would be desirable.  Unfortunately, this is one of the less-developed aspects of IQC.  In specific application areas guidelines may emerge by consensus. For example, in environmental studies it is usually recognised that relative uncertainties of less than ten percent in the concentration of a trace analyte are rarely of consequence. In food analysis the Horwitz curve(16) is sometimes used as a fitness for purpose criterion. Such criteria have been defined for clinical analysis(17,18). In some areas of applied geochemistry a systematic approach has given rise to fitness for purpose criteria for sampling and analytical precisions. However, it is not practicable here to give guidelines in these areas, and at present no general principles can be advanced that would allow specific applications to be addressed.

 

4.4.  The nature of errors

Two main categories of analytical error are recognised, namely random errors and systematic errors, which give rise to imprecision and bias respectively.  The importance of categorising errors in this way lies in the fact that they have different sources, remedies and consequences for the interpretation of data.

Random errors determine the precision of measurement.  They cause random positive and negative deviations of results about the underlying mean value. Systematic errors comprise displacement of the mean of many determinations from the true value. For the purposes of IQC two levels of systematic error are worth consideration.

(i)     Persistent  bias affects the analytical system (for a given type of test material) over a long period and affects all data.  Such bias, if small in relation to random error, may be identifiable only after the analytical system has been in operation for a long time. It might be regarded as tolerable, provided it is kept within prescribed bounds.

(ii)    The run effect is exemplified by a deviation of the analytical system during a particular run.  This effect, where it is sufficiently large, will be identified by IQC at the time of occurrence as an out-of-control condition.

The conventional division of errors between the random and the systematic depends on the timescale over which the system is viewed.  Run effects of unknown source can be regarded in the long‑term as the manifestation of a random process.  Alternatively, if a shorter‑term view is taken, the same variation could be seen as a bias-like change affecting a particular run.

The statistical model used for IQC in this document is as follows[1]. The value of a measurement (x) in a particular run is given by:

  • x = true value + persistent bias + run effect + random error (+ gross error).

The variance of x () in the absence of gross errors is given by:

where

= variance of the random error (within run) and

= variance of the run effect.

The variances of the true value and the persistent bias are both zero. An analytical system in control is fully described by , and the value of the persistent bias. Gross errors are implied when the analytical system does not comply with such a description.

 

  1. IQC and withing run precision

5.1.  Precision and duplication

A limited control of within-run precision is achieved by the duplication within a run of measurements made on test materials. The objective is to ensure that the differences between paired results are consistent with or better than the level implied by the value of o used by a laboratory for IQC purposes[2].  Such a test alerts the user to the possibility of poor within-run precision and provides additional information to help in interpreting control charts.  The method is especially useful in ad hoc analysis, where attention is centred on a single run and information obtained from control materials is unlikely to be completely satisfactory.

As a general approach all of the test materials, or a random selection from them, are analysed in duplicate. The absolute differences between duplicated analytical results and are tested against an upper control limit based on an appropriate value of . However, if the test materials in the run have a wide range of concentration of analyte, no single value of can be assumed(19).

Duplicates for IQC must reflect as far as possible the full range of variation present in the run. They must not be analysed as adjacent members of the run, otherwise they will reveal only the smallest possible measure of analytical variability. The best placing of duplicates is at random within each run. Moreover the duplication required for IQC requires the complete and independent analysis (preferably blind) of separate test portions of the test material.  A duplication of the instrumental measurement of a single test solution would be ineffective because the variations introduced by the preliminary chemical treatment of the test material would be absent.

5.2.  Interpretation of duplicate data

5.2.1.      Narrow concentration range. In the simplest situation the test materials comprising the run have a small range of analyte concentrations so that a common within-run standard deviation can be applied.

 

A value of this parameter must be estimated to provide a control limit. The upper 95% bound of is and on average only about three in a thousand results should exceed .A group of n duplicated results can be interpreted in several ways.

For example, the standardised difference

should have a normal distribution with zero mean and unit standard deviation.  The sum of a group of  n such results would have a standard deviation of so only about three runs in a thousand would produce a value of . Alternatively a group of n values of from a run can be combined to form and the result interpreted as a sample from a chi‑squared distribution with n degrees of freedom, (). Some caution is needed in the use of this statistic, however, as it is sensitive to outlying results.

5.2.2.      Wide concentration range.  If the test materials comprising a run have a wide range of analyte concentrations, no common standard of precision () can be assumed.  In such an instance must be expressed as a functional relationship with concentration.  The value of concentration for a particular material is taken to be , and an appropriate value of obtained from the functional relationship, the parameters of which have to be estimated in advance.

 

  1. Control materials in IQC

6.1.  Introduction

Control materials are characterised substances that are inserted into the run alongside the test materials and subjected to exactly the same treatment. A control material must contain an appropriate concentration of the analyte, and a value of that concentration must be assigned to the material. Control materials act as surrogates for the test materials and must therefore be representative, i.e., they should be subject to the same potential sources of error.  To be fully representative, a control material must have the same matrix in terms of bulk composition, including minor constituents that may have a bearing on accuracy. It should also be in a similar physical form, i.e., state of comminution, as the test materials. There are other essential characteristics of a control material. It must be adequately stable over the period of interest.  It must be possible to divide the control material into effectively identical portions for analysis.  It is often required in large amounts to allow its use over an extended period.

Reference materials in IQC are used in combination with control charts that allow both persistent bias and run effects to be addressed (Appendix 1). Persistent bias is evident as a significant deviation of the centre line from the assigned value. The variation in the run effect is predictable in terms of a standard deviation when the system is under statistical control, and that standard deviation is used to define action limits and warning limits at appropriate distances from the true value.

6.2.  The role of certified reference materials

Certified reference materials (CRM) as defined in Section 2 (i.e., with a statement of uncertainty and traceability), when available and of suitable composition, are ideal control materials in that they can be regarded for traceability purposes as ultimate standards of trueness(20).  In the past CRMs were regarded as being for reference purposes only and not for routine use.  A more modern approach is to treat CRMs as consumable and therefore suitable for IQC.

The use of CRMs in this way is, however, subject to a number of constraints.

(i)     Despite the constantly increasing range of CRMs available, for the majority of analyses there is no closely matching CRM available.

(ii)    Although the cost of CRMs is not prohibitive in relation to the total costs of analysis, it may not be possible for a laboratory with a wide range of activities to stock every relevant kind of reference material.

(iii)  The concept of the reference material is not applicable to materials where either the matrix or the analyte is unstable.

(iv)  CRMs are not necessarily available in sufficient amounts to provide for IQC use over extended periods.

(v)    It must be remembered that not all apparently certified reference materials are of equal quality.  Caution is suggested when the information on the certificate is inadequate.

If for any of the above reasons the use of a CRM is not appropriate it falls on individual laboratories or groups of laboratories to prepare their own control materials and assign traceable[3] values of analyte concentration to them.  Such a material is sometimes referred to as a "house reference material" (HRM). Suggestions for preparing HRMs are listed in Section 6.3. Not all of the methods described there are applicable to all analytical situations.

6.3.  Preparation of control materials

6.3.1.      Assigning a true value by analysis. In principle a working value can be assigned to a stable reference material simply by careful analysis. However, precautions are necessary to avoid biases in the assigned value. This requires some form of independent check such as may be provided by analysis of the materials in a number of laboratories and where possible, the use of methods based on different physico-chemical principles. Lack of attention to independent validation of control materials has been shown to be a weakness in IQC systems(15).

One way of establishing a traceable assigned value in a control material is to analyse a run comprising the candidate material and a selection of matching CRMs, with replication and randomisation. This course of action would be appropriate if only limited amounts of CRMs were available. The CRMs must be appropriate in both matrix composition and analyte concentration. The CRMs are used directly to calibrate the analytical procedure for the analysis of the control material.  An appropriate analytical method is a prerequisite for this approach. It would be a dangerous approach if, say, a minor and variable fraction of the analyte were extracted for measurement. The uncertainty introduced into the assigned value must also be considered.

6.3.2.      Materials validated in proficiency testing comprise a valuable source of control materials. Such materials would have been analysed by many laboratories using a variety of methods. In the absence of counter-indications, such as an obvious bias or unusual frequency distribution of results, the consensus of the laboratories could be regarded as a validated assigned value to which a meaningful uncertainty could be attached. (There is a possibility that the consensus could suffer from a bias of consequence, but this potential is always present in reference values.) There would be a theoretical problem of establishing the traceability of such a value, but that does not detract from the validity of the proposed procedure. The range of such materials available would be limited, but organisers of proficiency tests could ensure a copious supply by preparing batches of material in excess of the immediate requirements of the round.  The normal requirements of stability would have to be demonstrable.

6.3.3.      Assigning a true value by formulation. In favourable instances a control material can be prepared simply by mixing constituents of known purity in predetermined amounts.  For example, this approach would often be satisfactory in instances where the control material is a solution.  Problems are often encountered in formulation in producing solid control materials in a satisfactory physical state or in ensuring that the speciation and physical distribution of the analyte in the matrix is realistic. Moreover an adequate mixing of the constituents must be demonstrable.

6.3.4.      Spiked control materials. "Spiking" is a way of creating a control material in which a value is assigned by a combination of formulation and analysis. This method is feasible when a test material essentially free of the analyte is available.  After exhaustive analytical checks to ensure the background level is adequately low, the material is spiked with a known amount of analyte.  The reference sample prepared in this way is thus of the same matrix as the test materials to be analysed and of known analyte level - the uncertainty in the assigned concentration is limited only by the possible error in the unspiked determination.  However, it may be difficult to ensure that the speciation, binding and physical form of the added analyte is the same as that of the native analyte and that the mixing is adequate.

6.3.5.      Recovery Checks.  If the use of a reference material is not practicable then a limited check on bias is possible by a test of recovery. This is especially useful when analytes or matrices cannot be stabilised or when ad hoc analysis is executed. A test portion of the test material spiked with a known amount of the analyte and analysed alongside the original test material. The recovery of the added analyte (known as the "marginal recovery") is the difference between the two measurements divided by the amount that is added.  The obvious advantages of recovery checks are that the matrix is representative and the approach is widely applicable ‑ most test materials can be spiked by some means.  However, the recovery check suffers from the disadvantage previously noted regarding the speciation, binding and physical distribution of the analyte. Furthermore, the assumption of an equivalent recovery of the analyte added as a spike and of the native analyte may not be valid.  However, it can normally be assumed that a poor performance in a recovery check is strongly indicative of a similar or worse performance for the native analyte in the test materials.

Spiking and recovery testing as an IQC method must be distinguished from the method of standard additions, which is a measurement procedure: a single spiking addition cannot be used to fulfil the roles of both measurement and IQC.

6.4.  Blank determinations

Blank determinations are nearly always an essential part of the analytical process and can conveniently be effected alongside the IQC protocol. The simplest form of blank is the "reagent blank", where the analytical procedure is executed in all respects apart from the addition of the test portion. This kind of blank, in fact, tests more than the purity of the reagents. For example it is capable of detecting contamination of the analytical system originating from any source, e.g., glassware and the atmosphere, and is therefore better described as a "procedural blank". In some instances, better execution of blank determinations is achieved if a simulated test material is employed. The simulant could be an actual test material known to be virtually analyte-free or a surrogate (e.g., ashless filter paper used instead of plant material).  Where it can be contrived, the best type of blank is the "field blank", which is a typical matrix with zero concentration of analyte.

An inconsistent set of blanks in a run suggests sporadic contamination and may add weight to IQC evidence suggesting the rejection of the results.  When an analytical protocol prescribes the subtraction of a blank value, the blank value must be subtracted also from the results of the control materials before they are used in IQC.

6.5.  Traceability in spiking and recovery checks

Potential problems of the traceability of reagents used for spikes and recovery checks must be guarded against.  Under conditions where CRMs are not available, traceability can often be established only to the batch of analyte provided by a manufacturer. In such cases, confirmation of identity and a check on purity must be made before use. A further precaution is that the calibration standards and spike should not be traceable to the same stock solution of analyte or the same analyst.  If such a common traceability existed, then the corresponding sources of error would not be detected by the IQC.

 

  1. Recommendations

The following recommendations represent integrated approaches to IQC that are suitable for many types of analysis and applications areas. Managers of laboratory quality systems will have to adapt the recommendations to the demands of their own particular requirements. Such adaption could be implemented, for example, by adjusting the number of duplicates and control material inserted into a run, or by the inclusion of any additional measures favoured in the particular application area. The procedure finally chosen and its accompanying decision rules must be codified in an IQC protocol that is separate from the analytical system protocol.

The practical approach to quality control is determined by the frequency with which the measurement is carried out and the size and nature of each run. The following recommendations are therefore made.  The use of control charts and decision rules are covered in Appendix 1.

In each of the following the order in the run in which the various materials are analysed should be randomised if possible. A failure to randomise may result in an underestimation of various components of error.

(i)     Short (e.g., n<20) frequent runs of similar materials. Here the concentration range of the

analyte in the run is relatively small, so a common value of standard deviation can be assumed.

Insert a control material at least once per run.  Plot either the individual values obtained, or

The mean value, on an appropriate control chart.  Analyse in duplicate at least half of the

Test materials, selected at random.  Insert at least one blank determination.

(ii)    Longer (e.g., n>20) frequent runs of similar materials.  Again a common level of standard deviation is assumed.

Insert the control material at an approximate frequency of one per ten test materials.  If the run size is likely to vary from run to run it is easier to standardise on a fixed number of insertions per run and plot the mean value on a control chart of means.  Otherwise plot individual values.

Analyse in duplicate a minimum of five test materials selected at random.  Insert one blank

determination per ten test materials.

(iii)  Frequent runs containing similar materials but with a wide range of analyte concentration. Here we cannot assume that a single value of standard deviation is applicable.

Insert control materials in total numbers approximately as recommended above.  However, there should be at least two levels of analyte represented, one close to the median level of typical test materials, and the other approximately at the upper or lower decile as appropriate.  Enter values for the two control materials on separate control charts.  Duplicate a minimum of five test materials, and insert one procedural blank per ten test materials.

(iv)  Ad hoc analysis.  Here the concept of statistical control is not applicable.  It is assumed, however, that the materials in the run are of a single type, i.e., sufficiently similar for general conclusions on errors to be made.

Carry out duplicate analysis on all of the test materials.Carry out spiking or recovery tests or use a formulated control material, with an appropriate number of insertions (see above), and with different concentrations of analyte if appropriate. Carry out blank determinations.  As no control limits are available, compare the bias and precision with fitness for purpose limits or other established criteria..

  1. Conclusions

Internal quality control is an essential aspect of ensuring that data released from a laboratory are fit for purpose. If properly executed, quality control methods can monitor the various aspects of data quality on a run-by-run basis. In runs where performance falls outside acceptable limits, the data produced can be rejected and, after remedial action on the analytical system, the analysis can be repeated.

It must be stressed, however, that internal quality control is not foolproof even when properly executed. Obviously it is subject to "errors of both kinds", i.e., runs that are in control will occasionally be rejected and runs that are out of control occasionally accepted. Of more importance, IQC cannot usually identify sporadic gross errors or short‑term disturbances in the analytical system that affect the results for individual test materials.  Moreover, inferences based on IQC results are applicable only to test materials that fall within the scope of the analytical method validation. Despite these limitations, which professional experience and diligence can alleviate to a degree, internal quality control is the principal recourse available for ensuring that only data of appropriate quality are released from a laboratory. When properly executed it is very successful.

Finally, it must be appreciated that a perfunctory execution of any quality system will not guarantee the production of data of adequate quality. The correct procedures for feedback, remedial action and staff motivation must also be documented and acted upon. In other words, there must be a genuine commitment to quality within a laboratory for  an internal quality control programme to succeed, i.e., the IQC must be part of a total quality management system.

  1. References
  • "Protocol for the Design, Conduct and Interpretation of Method Performance Studies",  Edited W Horwitz,  Pure Appl. Chem.,  1988, 60, 855‑ 864.  (Revision in press)
  • "The International Harmonised Protocol for the Proficiency Testing of (Chemical) Analytical Laboratories", Edited M Thompson and R  Wood,  Pure Appl. Chem., 1993, 65, 2123-2144.  (Also published in J. AOAC International, 1993, 76, 926-940.
  • "IFCC approved recommendations on quality control in clinical chemistry.  Part 4: internal quality control",  J. Clin. Chem. Clin. Biochem., 1980, 18, 534-541.
  • S Z Cekan, S B Sufi and E W Wilson,  "Internal quality control for assays of reproductive hormones: Guidelines for laboratories".  WHO, Geneva, 1993.
  • M Thompson, "Control procedures in geochemical analysis", in R J Howarth (Ed), "Statistics and data analysis in geochemical prospecting", Elsevier, Amsterdam, 1983.
  • M Thompson, "Data quality in applied geochemistry: the requirements and how to achieve them",   J. Geochem. Explor., 1992, 44, 3-22.
  • Health and Safety Executive, "Analytical quality in workplace air monitoring", London, 1991
  • "A protocol for analytical quality assurance in public analysts' laboratories", Association of Public Analysts, 342 Coleford Road, Sheffield S9 5PH, UK, 1986.
  • "Method evaluation, quality control, proficiency testing" (AMIQAS PC Program), National Institute of Occupational Health, Denmark, 1993.
  • ISO 8402:1994.  "Quality assurance and quality management - vocabulary".
  • ISO 3534 -1: 1993 (E/F).  "Statistics, vocabulary and symbols - Part 1: Probability and general statistical terms".
  • ISO Guide 30:1992.  "Terms and definitions used in connections with reference materials"
  • "International vocabulary for basic and general terms in metrology" , 2nd Edition, 1993, ISO, Geneva.
  • "Guide to the expression of uncertainty in measurement", ISO, Geneva, 1993.
  • M Thompson and P J Lowthian, Analyst,  1993, 118, 1495-1500.
  • W Horwitz,  L R Kamps and K W Boyer,  J. Assoc. Off. Anal. Chem.,  1980, 63, 1344.
  • D Tonks,  Clin. Chem.,  1963, 9, 217-223.
  • G C Fraser, P H Petersen, C Ricos and R Haeckel, "Proposed quality specifications for the imprecision and inaccuracy of analytical systems for clinical chemistry", Eur. J. Clin. Chem. Clin. Biochem., 1992, 30, 311-317.
  • M Thompson,  Analyst, 1988, 113, 1579-1587.
  • ISO Guide 33: 1989, "Uses  of Certified Reference Materials",  Geneva.

Appendix 1. Shewhart control charts

  1. Introduction

The theory, construction and interpretation of the Shewhart chart(1) are detailed in numerous texts on process quality control and applied statistics, and in several ISO standards(2-5). There is a considerable literature on the use of the control chart in clinical chemistry(6,7). Westgard and co-workers have formulated multiple rules for the interpretation of such control charts(8), and the power of these results has been studied in detail(9-10). In this appendix only simple Shewhart charts are considered.

In IQC a Shewhart control chart is obtained when values of concentration measured on a control material in successive runs are plotted on a vertical axis against the run number on the horizontal axis.  If more than one analysis of a particular control material is made in a run, either the individual results x or the mean value can be used to form a control chart.  The chart is completed by horizontal lines derived from the normal distribution N(μ,) that is taken to describe the random variations in the plotted values. The selected lines for control purposes are μ σ and μ σ. Different values of σ are required for charts of individual values and of means. For a system in statistical control, on average about one in twenty values fall outside the μ σ lines, called the "warning limits", and only about three in one thousand fall outside the μσ lines, the "action limits".  In practice the estimates and s of the parameters μ and  are used to construct the chart. A persistent bias is indicated by a significant difference between and the assigned value

  1. Estimates of the parameters μ and σ

An analytical system under control exhibits two sources of random variation, the within-run, characterised by variance and the between-run with variance . The two variances are typically comparable in magnitude. The standard deviation used in a chart of individual values is given by

whereas for a control chart of mean values the standard deviation is given by

where n is the number of control measurements in a run from which the mean is calculated. The value of n therefore must be constant from run to run, otherwise control limits would be impossible to define. If a fixed number of repeats of a control material per run cannot be guaranteed (e.g., if the run length were variable) then charts of individual values must be used.  Furthermore the equation indicates that or must be estimated with care.  An attempt to base an estimate on repeat values from a single run would result in unduly narrow control limits.

Estimates must therefore include the between-run component of variance.  If the use of a particular value of n can be assumed at the outset, then  can be estimated directly from the m means

(i = 1,.....,m)  of the n repeats in each of m successive runs.

Thus the estimate of μ is

and the estimate of is

If the value of n is not predetermined, then separate estimates of and could be obtained by one-way analysis of variance.  If the mean squares within- and between- groups are and respectively, then

is estimated by and

is estimated by (

Often in practice it is necessary to initiate a control chart with data collected from a small number of runs, which may be to a degree unrepresentative, as estimates of standard deviation are very variable unless large numbers of observations are used.  Moreover, during the initial period, the occurrence of out-of-control conditions are more than normally likely and will produce outlying values.  Such values would bias and inflate s beyond its proper value. It is therefore advisable to recalculate and s after a further "settling down" period.  One method of obviating the effects of outliers in the calculation is to reject them after the application of Dixon's Q or Grubbs'(11) test, and then use the classical statistics given above.  Alternatively, the methods of robust statistics could be applied to the data(12, 13).

  1. The interpretation of control charts

The following simple rules can be applied to control charts of individual results or of means.

 

Single control chart.  An out-of-control condition in the analytical system is signalled if any of the following occur.

(i)     The current plotting value falls outside the action limits.

(ii)    The current value and the previous plotting value fall outside the warning limits but within the actions limits

(iii)  Nine successive plotting values fall on the same side of the mean line.

 

Two control charts.  When two different control materials are used in each run, the respective control charts are considered simultaneously. This increases the chance of a type 1 error (rejection of a sound run) but decreases the chance of a type 2 error (acceptance of a flawed run). An out-of-control condition is indicated if any of the following occur.

(i)     At least one of the plotting values falls outside the action limits.

(ii)    Both of the plotting values are outside the warning limits.

(iii)  The current value and the previous plotting value on the same control chart both fall outside the warning limits.

(iv)  Both control charts simultaneously show that four successive plotting values on the same side of the mean line.

(v)    One of the charts shows nine successive plotting values falling on the same side of the mean line.

A more thorough treatment of the control chart can be obtained by the application of the full Westgard rules, illustrated in Figure 2.

The analytical chemist should respond to an out-of-control condition by cessation of analysis pending diagnostic tests and remedial action followed by rejection of the results of the run and reanalysis of the test materials.

  1. References
  • W A Shewhart, "Economic control of quality in manufactured product", Van Nostrand, New York, 1931.
  • ISO 8258:1991.  "Shewhart control charts".
  • ISO 7873:1993  "Control charts for arithmetic means with warning limits".
  • ISO 7870:1993.  "Control charts - general guide and introduction".
  • ISO 7966:1993.  "Acceptance control charts".
  • S Levey and E R Jennings, Am. J. Clin. Pathol., 1950, 20, 1059-1066.
  • A B J Nix, R J Rowlands, K W Kemp, D W Wilson and K Griffiths,  Stat. Med., 1987,6,425-440.
  • J O Westgard, P L Barry and M R Hunt, Clin. Chem., 1981, 27, 493-501.
  • C A Parvin,  Clin.  Chem., 1992, 38, 358-363.
  • J Bishop and A B J Nix,  Clin. Chem., 1993, 39, 1638-1649.
  • W Horwitz, Pure Appl. Chem., (in press).
  • Analytical Methods Committee,  Analyst, 1989, 114, 1693-1697.
  • Analytical Methods Committee,  Analyst, 1989, 114, 1699-1702.

--------

Technical report from the  Symposium on the 'Harmonisation of quality assurance systems for Analysis Laboratories, Washington DC, USA, 22-23 July 1993 sponsored by IUPAC, ISO et AOAC International

Prepared for publication by MICHAEL THOMPSON1 and ROGER WOOD2

1Department of Chemistry, Birkbeck College (University of London), London WC1H OPP, UK

2MAFF Food Science Laboratory, Norwich Research Park, Colney, Norwich NR4 7UQ, UK

1991-95 work group :

Chairman : M. Parkany (Switzerland) ; Membres : T. Anglov (Denmark) ; K. Bergknut (Norway and sweden) ; P. De Biève (Belgium) ; K.-G. von Boroviczény (Germany) ; J.M. Christensen (Denmark) ; T.D. Geary (South Australia) ; R. Greenhalgh (Canada) ; A.J. Head (United Kingdom) ; P.T. Holland (New Zealand) ; W. Horwitz (USA) . A. Kallner (Sweden; J. Kristiansen (Denmark) ; S.H.H. Olrichs (Netherlands) ; N. Palmer (USA) . M. Thompson (United Kingdom) ; M.J. Vernengo (Argentina) ; R. Wood (United Kingdom).


    [1]   The model could be extended if necessary to include other features of the analytical system

    [2]There is no intention here of estimating the standard deviation of repeatability r from the IQC data or of comparing estimates: there would usually be too few results for a satisfactory outcome.  Where such an estimate is needed the formula    can be used.

    [3]Where a CRM is not available traceability only to a reference method or to a batch of a reagent supplied by a manufacturer may be necessary.

Practical guide for the Validation

OIV-MA-AS1-12 Practical guide for the validation, quality control, and uncertainty assessment of an alternative oenological analysis method

Contents

1. Purpose

2. Preamble and scope

3. General vocabulary

4. General principles

4.1 Methodology

4.2 Definition of measurement error

5. Validating a method

5.1 Methodology

5.2 Section one: Scope of method

5.2.1 Definition of analyzable matrices

5.2.2 Detection and quantification limit

5.2.2.1 Normative definition

5.2.2.2 Reference documents

5.2.2.3 Application

5.2.2.4 Procedure

5.2.2.4.1 Determination on blank

5.2.2.4.1.1 Scope

5.2.2.4.1.2 Basic protocol and calculations

5.2.2.4.2 Approach by linearity study

5.2.2.4.2.1 Scope

5.2.2.4.2.2 Basic protocol and calculations

5.2.2.4.3 Graphic approach based on the background noise of the recording

5.2.2.4.3.1 Scope

5.2.2.4.3.2 Basic protocol and calculation

5.2.2.4.4 Checking a predetermined quantification limit

5.2.2.4.4.1 Scope

5.2.2.4.4.2 Basic protocol and calculation

5.2.3 Robustness

5.2.3.1 Definition

5.2.3.2 Determination

5.3 Section two: systematic error study

5.3.1 Linearity study

5.3.1.1 Normative definition

5.3.1.2 Reference documents

5.3.1.3 Application

5.3.1.4 ISO 11095-type approach

5.3.1.4.1 Basic protocol

5.3.1.4.2 Calculations and results

5.3.1.4.2.1 Defining the regression model

5.3.1.4.2.2 Estimating parameters

5.3.1.4.2.3 Charts

5.3.1.4.2.4 Test of the linearity assumption

5.3.1.4.2.4.1 Definitions of errors linked to calibration

5.3.1.4.2.4.2 Fischer-Snedecor test

5.3.1.5 ISO 8466-type approach

5.3.1.5.1 Basic protocol

5.3.1.5.2 Calculations and results

5.3.1.5.2.1 Defining the linear regression model

5.3.1.5.2.2 Defining the polynomial regression model

5.3.1.5.2.3 Comparing residual standard deviations

5.3.2 Specificity

5.3.2.1 Normative definition

5.3.2.2 Application

5.3.2.3 Procedures

5.3.2.3.1 Standard addition test

5.3.2.3.1.1 Scope

5.3.2.3.1.2 Basic protocol

5.3.2.3.1.3 Calculations and results

5.3.2.3.1.3.1 Study of the regression line r = a + b.v

5.3.2.3.1.3.2 Analysis of the results

5.3.2.3.1.3.3 Overlap line graphics

5.3.2.3.2 Study of the influence of other compounds on the measurement result

5.3.2.3.2.1 Scope

5.3.2.3.2.2 Basic protocol and calculations

5.3.2.3.2.3 Interpretation

5.3.3 Study of method accuracy

5.3.3.1 Presentation of the step

5.3.3.1.1 Definition

5.3.3.1.2 General principles

5.3.3.1.3 Reference documents

5.3.3.2 Comparison of the alternative method with the OIV reference method

5.3.3.2.1 Scope

5.3.3.2.2 Accuracy of the alternative method compared with the reference method

5.3.3.2.2.1 Definition

5.3.3.2.2.2 Scope

5.3.3.2.2.3 Basic protocol and calculations

5.3.3.2.2.4 Interpretation

5.3.3.3 Comparison by interlaboratory tests

5.3.3.3.1 Scope

5.3.3.3.2 Basic protocol and calculations

5.3.3.3.3 Interpretation

5.3.3.4 Comparison with reference materials

5.3.3.4.1 Scope

5.3.3.4.2 Basic protocol and calculations

5.3.3.4.3 Interpretation

5.4 Section three: random error study

5.4.1 General principle

5.4.2 Reference documents

5.4.3 Precision of the method

5.4.3.1 Definition

5.4.3.2 Scope

5.4.3.3 General theoretical case

5.4.3.3.1 Basic protocol and calculations

5.4.3.3.1.1 Calculations with several test materials

5.4.3.3.1.2 Calculations with 1 test material

5.4.3.4 Repeatability

5.4.3.4.1 Definitions

5.4.3.4.2 Scope

5.4.3.4.3 Basic protocol and calculations

5.4.3.4.3.1 General case

5.4.3.4.3.2 Particular case applicable to only 1 repetition

5.4.3.4.4 Comparison of repeatability

5.4.3.4.4.1 Determination of the repeatability of each method

5.4.3.4.4.2 Fischer-Snedecor test

5.4.3.5 Intralaboratory reproducibility

5.4.3.5.1 Definition

5.4.3.5.2 Scope

5.4.3.5.3 Basic protocol and calculations

6. Quality control of analysis methods (IQC)

6.1 Reference documents

6.2 General principles

6.3 Reference materials

6.4 Checking the analytical series

6.4.1 Definition

6.4.2 Checking accuracy using reference materials

6.4.3 Intraseries precision

6.4.4 Internal standard

6.5 Checking the analysis system

6.5.1 Definition

6.5.2 Shewhart chart

6.5.2.1 Data acquisition

6.5.2.2 Presentation of results and definition of limits

6.5.2.3 Using the Shewhart chart

6.5.3 Internal comparison of analysis systems

6.5.4 External comparison of the analysis system

6.5.4.1 Analysis chain of interlaboratory comparisons

6.5.4.2 Comparison with external reference materials

6.5.4.2.1 Standard uncertainty of reference material

6.5.4.2.2 Defining the validity limits of measuring reference material

7. Assessment of measurement uncertainty

7.1 Definition

7.2 Reference documents

7.3 Scope

7.4 Methodology

7.4.1 Definition of the measurand, and description of the quantitative analysis
      method

7.4.2 Critical analysis of the measurement process

7.4.3 Estimation calculations of standard uncertainty
      (intralaboratory approach)

7.4.3.1 Principle

7.4.3.2 Calculating the standard deviation of intralaboratory reproducibility

7.4.3.3 Estimating typical sources of systematic errors not taken into account underµ
      reproducibility conditions

7.4.3.3.1 Gauging error (or calibration error)

7.4.3.3.1.1 Procedure

7.4.3.3.1.2 Calculations and results

7.4.3.3.1.3 Estimating the standard uncertainty associated the gauging line
        (or calibration line)

7.4.3.3.2 Bias error

7.4.3.3.2.1 Methods adjusted with only one certified reference material

7.4.3.3.2.2 Methods adjusted with several reference materials (gauging ranges etc)

7.4.3.3.3 Matrix effect

7.4.3.3.3.1 Definition

7.4.3.3.4 Sample effect

7.4.4 Estimating standard uncertainty by interlaboratory tests

7.4.4.1 Principle

7.4.4.2 Using the standard deviation of interlaboratory and intramethod
      reproducibility SRinter (method)

7.4.4.3 Using the standard deviation of interlaboratory and intermethod
      reproducibility SRinter

7.4.4.4 Other components in the uncertainty budget

7.5 Expressing expanded uncertainty

  1. Purpose

The purpose of this guide is to assist oenological laboratories carrying out serial analysis as part of their validation, internal quality control and uncertainty assessment initiatives concerning the standard methods they use.

  1. Preamble and scope

International standard ISO 17025, defining the "General Requirements for the Competence of Testing and Calibration Laboratories", states that the accredited laboratories must, when implementing an alternative analytical method, make sure of the quality of the results obtained. To do so, it indicates several steps. The first step consists in defining the customers' requirements concerning the parameter in question, in order to determine, thereafter, whether the method used meets those requirements. The second step includes initial validation for non-standardized, modified or laboratory-developed methods. Once the method is applied, the laboratories must use inspection and traceability methods in order to monitor the quality of the results obtained. Finally, they must assess the uncertainty of the results obtained.

In order to meet these requirements, the laboratories have a significant reference system at their disposal comprising a large number of international guides and standards. However, in practice, the application of these texts is delicate since, because they address every category of calibration and test laboratory, they remain very general and presuppose, on behalf of the reader, in-depth knowledge of the mathematical rules applicable to statistical data processing.

This guide is based on this international reference system, taking into account the specific characteristics of oenology laboratories routinely carrying out analyses on series of must or wine samples. Defining the scope of application in this way enabled a relevant choice of suitable tools to be made, in order to retain only those methods most suitable for that scope. Since it is based on the international reference system, this guide is therefore strictly compliant with it. Readers, however, wishing to study certain points of the guide in greater detail can do so by referring to the international standards and guides, the references for which are given in each chapter.

The authors have chosen to combine the various tools meeting the requirements of the ISO 17025 standard since there is an obvious solution of continuity in their application, and the data obtained with certain tools can often be used with the others. In addition, the mathematical resources used are often similar.

The various chapters include application examples, taken from oenology laboratories using these tools.

It is important to point out that that this guide does not pretend to be exhaustive. It is only designed to present, in as clear and applicable a way as possible, the contents of the requirements of the ISO 17025 standard and the basic resources that can be implemented in a routine laboratory to meet them. Each laboratory remains perfectly free to supplement these tools or to replace them by others that they consider to be more efficient or more suitable.

Finally, the reader’s attention should be drawn to the fact that the tools presented do not constitute an end in themselves and that their use, as well as the interpretation of the results to which they lead, must always be subject to critical analysis. It is only under these conditions that their relevance can be guaranteed, and laboratories will be able to use them as tools to improve the quality of the analyses they carry out.

  1. General vocabulary

The definitions indicated below used in this document result from the normative references given in the bibliography.

Analyte

Object of the analysis method

Blank

Test carried out in the absence of a matrix (reagent blank) or on a matrix which does not contain the analyte (matrix blank).

Bias

Difference between the expected test results and an accepted reference value.

Uncertainty budget

The list of uncertainty sources and their associated standard uncertainties, established in order to assess the compound standard uncertainty associated with a measurement result.

Gauging (of a measuring instrument)

Material positioning of each reference mark (or certain principal reference marks only) of a measuring instrument according to the corresponding value of the measurand.

NOTE  "gauging" and "calibration" are not be confused

Repeatability conditions

Conditions where independent test results are obtained with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time.

Reproducibility conditions (intralaboratory)

Conditions where independent test results are obtained with the same method on identical test items in the same laboratory by the same or different operator(s) using different gauges on different days.

Experimental standard deviation

For a series of n measurements of the same measurand, the quantity s characterizing the dispersion of the results and given by the formula:

being the result of the measurementand the arithmetic mean of the n results considered.

Repeatability standard deviation

Standard deviation of many repetitions obtained in a single laboratory by the same operator on the same instrument, i.e. under repeatable conditions.

Internal reproducibility standard deviation (or total intralaboratory variability)

Standard deviation of repetitions obtained in a single laboratory with the same method, using several operators or instruments and, in particular, by taking measurements on different dates, i.e. under reproducibility conditions.

Random error

Result of a measurement minus the mean that would result from an infinite number of measurements of the same measurand carried out under reproducibility conditions.

Measurement error

Result of a measurement minus a true value of the measurand.

Systematic error

Mean error that would result from an infinite number of measurements of the same measurand carried out under reproducibility conditions minus a true value of the measurand.

NOTE  Error is a highly theoretical concept in that it calls upon values that are not accessible in practice, in particular the true values of measurands. On principle, the error is unknown.

Mathematical expectation

For a series of n measurements of the same measurand, if n tends towards the infinite, the mean tends towards the expectation E(x).

Calibration

Series of operations establishing under specified conditions the relation between the values of the quantity indicated by a measuring instrument or system, or the values represented by a materialized measurement or a reference material, and the corresponding values of the quantity measured by standards.

Intralaboratory evaluation of an analysis method

Action which consists in submitting an analysis method to an intralaboratory statistical study, based on a standardized and/or recognized protocol, demonstrating that within its scope, the analysis method meets pre-established performance criteria.

Within the framework of this document, the evaluation of a method is based on an intralaboratory study, which includes the comparison with a reference method.

Precision

Closeness of agreement between independent test results obtained under prescribed conditions

Note 1 Precision depends only on the distribution of random errors and does not have any relationship with the true or specified value.

Note 2  The measurement of precision is expressed on the basis of the standard deviation of the test results.

Note 3 The expression "independent test results" refers to results obtained such that they are not influenced by a previous result on the same or a similar test material. Quantitative measurements of precision are critically dependent upon the prescribed conditions. Repeatability and reproducibility conditions are particular sets of extreme conditions.

Quantity (measurable)

An attribute of a phenomenon, body or substance that may be distinguished qualitatively and determined quantitatively.

Uncertainty of measurement

A parameter associated with the result of a measurement, which characterizes the dispersion of the values that could reasonably be attributed to the measurand.

Standard uncertainty (u(xi))

Uncertainty of the result of a measurement expressed in the form of a standard deviation.

Accuracy

Closeness of agreement between the mean value obtained starting from a broad series of test results and an accepted reference value.

Note  The measurement of accuracy is generally expressed in terms of bias.

Detection limit

Lowest amount of an analyte to be examined in a test material that can be detected and regarded as different from the blank value (with a given probability), but not necessarily quantified. In fact, two risks must be taken into account:

  • the risk α of considering the substance is present in test material when its quantity is null;
  • the risk β of considering a substance is absent from a substance when its quantity is not null.

Quantification limit

Lowest amount of an analyte to be examined in a test material that can be quantitatively determined under the experimental conditions described in the method with a defined variability (given coefficient of variation).

Linearity

The ability of a method of analysis, within a certain range, to provide an instrumental response or results proportional to the quality of analyte to be determined in the laboratory sample.

This proportionality is expressed by an a priori defined mathematical expression.

The linearity limits are the experimental limits of concentrations between which a linear calibration model can be applied with a known confidence level (generally taken to be equal to 1%).

Test material

Material or substance to which a measuring can be applied with the analysis method under consideration.

Reference material

Material or substance one or more of whose property values are sufficiently homogeneous and well established to be used for the calibration of an apparatus, the assessment of a measurement method, or for assigning values to materials.

Certified reference material

Reference material, accompanied by a certificate, one or more whose property values are certified by a procedure which establishes its traceability to an accurate realization of the unit in which the property values are expressed, and for which each certified value is accompanied by an uncertainty at a stated level of confidence.

Matrix

All the constituents of the test material other than the analyte.

Analysis method

Written procedure describing all the means and procedures required to carry out the analysis of the analyte, i.e.: scope, principle and/or reactions, definitions, reagents, apparatus, procedures, expression of results, precision, test report.

WARNING The expressions "titration method" and "determination method" are sometimes used as synonyms for the expression "analysis method". These two expressions should not be used in this way.

 

Quantitative analysis method

Analysis method making it possible to measure the analyte quantity present in the laboratory test material.

Reference analysis method (Type I or Type II methods)

Method, which gives the accepted reference value for the quantity of the analyte to be measured.

 

Non-classified alternative method of analysis

A routine analysis method used by the laboratory and not considered to be a reference method.

NOTE  An alternative method of analysis can consist in a simplified version of the reference method.

Measurement

Set of operations having the object of determining a value of a quantity.

Note  The operations can be carried out automatically.

 

Measurand

Particular quantity subject to measurement.

Mean

For a series of n measurements of the same measurand, mean value, given by the formula:

being the result of the measurement.

Result of a measurement

Value assigned to a measurand, obtained by measurement

Sensitivity

Ratio between the variation of the information value of the analysis method and the variation of the analyte quantity.

The variation of the analyte quantity is generally obtained by preparing various standard solutions, or by adding the analyte to a matrix.

Note 1 Defining, by extension, the sensitivity of a method as its capacity to detect small quantities should be avoided.

Note 2 A method is said to be “sensitive" if a low variation of the quantity or analyte quantity incurs a significant variation in the information value.

Measurement signal

Quantity representing the measurand and is functionally linked to it.

Specificity

Property of an analysis method to respond exclusively to the determination of the quantity of the analyte considered, with the guarantee that the measured signal comes only from the analyte.

Tolerance

Deviation from the reference value, as defined by the laboratory for a given level, within which a measured value of a reference material can be accepted.

Value of a quantity

Magnitude of a particular quantity generally expressed as a unit of measurement multiplied by a number.

 

True value of a quantity

Value compatible with the definition of a given particular quantity.

Note 1 The value that would be obtained if the measurement was perfect

Note 2 Any true value is by nature indeterminate

Accepted reference value

A value that serves as an agreed-upon reference for comparison and which is derived as:

a) a theoretical or established value, based on scientific principles;

b) an assigned or certified value, based on experimental work of some national or international organization;

c) a consensus or certified value, based on collaborative experimental work under the auspices of a scientific or engineering group;

Within the particular framework of this document, the accepted reference value (or conventionally true value) of the test material is given by the arithmetic mean of the values of measurements repeated as per the reference method.

Variance

Square of the standard deviation.

  1. General principles

4.1.  Methodology

When developing a new alternative method, the laboratory implements a protocol that includes several steps. The first step, applied only once at the initial stage, or on a regular basis, is the validation of the method. This step is followed by permanent quality control. All the data collected during these two steps make it possible to assess the quality of the method. The data collected during these two steps are used to evaluate the measurement uncertainty. The latter, which is regularly assessed, is an indicator of the quality of the results obtained by the method under consideration.

 

All these steps are inter-connected and constitute a global approach that can be used to assess and control measurement errors.

4.2.  Definition of measurement error

Any measurement carried out using the method under study gives a result which is inevitably associated with a measurement error, defined as being the difference between the result obtained and the true value of the measurand. In practice, the true value of the measurand is inaccessible and a value conventionally accepted as such is used instead.

The measurement error includes two components:

Measurement error

 

True value= Analysis results

Systematic error

Random error

In practice, the systematic error results in a bias in relation to the true value, the random error being all the errors associated with the application of the method.

These errors can be graphically represented in the following way:

 

The validation and quality control tools are used to evaluate the systematic errors and the random errors, and to monitor their changes over time.

 

  1. Validating a method

 

5.1.  Methodology

Implementing the validation comprises 3 steps, each with objectives. To meet these objectives, the laboratory has validation tools. Sometimes there are many tools for a given objective, and are suitable for various situations. It is up to the laboratory to correctly choose the most suitable tools for the method to be validated.

Steps

Objectives

Tools for validation

 

Scope of application

- To define the analyzable matrices

- To define the analyzable range

Detection and quantification limit

Robustness study

Systematic error

or bias

- Linear response in the scale of analyzable values

Linearity study

- Specificity of the method

Specificity study

- Accuracy of the method

Comparison with a reference method

Comparison with reference materials

Interlaboratory comparison

Random error

- Precision of the method

Repeatability study

Intralaboratory reproducibility study

5.2.  Section one: Scope of method

5.2.1.      Definition of analyzable matrices

The matrix comprises all constituents in the test material other than the analyte.

If these constituents are liable to influence the result of a measurement, the laboratory should define the matrices on which the method is applicable.

For example, in oenology, the determination of certain parameters can be influenced by the various possible matrices (wines, musts, sweet wines, etc.).

In case of doubt about a matrix effect, more in-depth studies can be carried out as part of the specificity study.

5.2.2.      Detection and quantification limit

This step is of course not applicable and not necessary for those methods whose lower limit does not tend towards 0, such as alcoholic strength by volume in wines, total acidity in wines, pH, etc.

5.2.2.1.                        Normative definition

The detection limit is the lowest amount of analyte that can be detected but not necessarily quantified as an exact value. The detection limit is a parameter of limit tests.

The quantification limit is the lowest quantity of the compound that can be determined using the method.

5.2.2.2.                        Reference documents

  • NF V03-110 Standard, intralaboratory validation procedure for an alternative method in relation to a reference method.
  • International compendium of analysis methods – OIV, Assessment of the detection and quantification limit of an analysis method (Oeno resolution 7/2000).

5.2.2.3.                        Application

In practice, the quantification limit is generally more relevant than the detection limit, the latter being by convention 1/3 of the first.

There are several approaches for assessing the detection and quantification limits:

  • Determination on blank
  • Approach by the linearity study
  • Graphic approach

These methods are suitable for various situations, but in every case they are mathematical approaches giving results of informative value only. It seems crucial, whenever possible, to introduce a check of the value obtained, whether by one of these approaches or estimated empirically, using the checking protocol for a predetermined quantification limit.

5.2.2.4.                        Procedure

5.2.2.4.1.                 Determination on blank

5.2.2.4.1.1.           Scope

This method can be applied when the blank analysis gives results with a non-zero standard deviation. The operator will judge the advisability of using reagent blanks, or matrix blanks.

If the blank, for reasons related to uncontrolled signal preprocessing, is sometimes not measurable or does not offer a recordable variation (standard deviation of 0), the operation can be carried out on a very low concentration in analyte, close to the blank.

5.2.2.4.1.2.           Basic protocol and calculations

Carry out the analysis of n test materials assimilated to blanks, n being equal to or higher than 10.

  • Calculate the mean of the results obtained:

  • Calculate the standard deviation of the results obtained:

  • From these results the detection limit is conventionally defined by the formula:

  • From these results the quantification limit is conventionally defined by the formula:

 

Example: The table below gives some of the results obtained when assessing the detection limit for the usual determination of free sulfur dioxide.

 

Test material #

X

( mg/l)

1

0

2

1

3

0

4

1.5

5

0

6

1

7

0.5

8

0

9

0

10

0.5

11

0

12

0

 

The calculated values are as follows:

  • q = 12
  • = 0.375
  • = 0.528 mg/l
  • DL = 1.96 mg/l
  • QL = 5.65 mg/l

 

5.2.2.4.2.                 Approach by linearity study

5.2.2.4.2.1.           Scope

This method can be applied in all cases, and is required when the analysis method does not involve background noise. It uses the data calculated during the linearity study.

Note This statistical approach may be biased and give pessimistic results when linearity is calculated on a very wide range of values for reference materials, and whose measurement results include variable standard deviations. In such cases, a linearity study limited to a range of low values, close to 0 and with a more homogeneous distribution will result in a more relevant assessment.

5.2.2.4.2.2.           Basic protocol and calculations

Use the results obtained during the linearity study which made it possible to calculate the parameters of the calibration function y = a+ b.x

The data to be recovered from the linearity study are (see chapter 5.3.1. linearity study):

  • slope of the regression line:

  • residual standard deviation:

  • standard deviation at the intercept point (to be calculated):

The estimates of the detection limit DL and the quantification limit QL are calculated using following formulae:

Estimation detection limit

Estimated quantification limit

Example: Estimatation of the detection and quantification limits in the determination of sorbic acid by capillary electrophoresis, based on linearity data acquired on a range from 1 to 20 mg.L-1.

X (ref)

Y1

Y2

Y3

Y4

1

1.9

0.8

0.5

1.5

2

2.4

2

2.5

2.1

3

4

2.8

3.5

4

4

5.3

4.5

4.7

4.5

5

5.3

5.3

5.2

5.3

10

11.6

10.88

12.1

10.5

15

16

15.2

15.5

16.1

20

19.7

20.4

19.5

20.1

Number of reference materials

n = 8

Number of replicas

p = 4

Straight line (y = a + b*x)

b = 0.9972

a = 0.51102

residual standard deviation:

Sres = 0.588

Standard deviation on the intercept point

Sa = 0.1597

The estimated detection limit is DL = 0.48 mg.L-1

The estimated quantification limit is QL = 1.6 mg.L-1

5.2.2.4.3.                 Graphic approach based on the background noise of the recording

5.2.2.4.3.1.           Scope

This approach can be applied to analysis methods that provide a graphic recording (chromatography, etc.) with a background noise. The limits are estimated from a study of the background noise.

5.2.2.4.3.2.           Basic protocol and calculation

Record a certain number of reagent blanks, using 3 series of 3 injections separated by several days.

Determine the following values:

  • the greatest variation in amplitude on the y-axis of the signal observed between two acquisition points, excluding drift, at a distance equal to twenty times the width at mid-height of the peak corresponding to the analyte, centered over the retention time of the compound under study.
  • R, the quantity/signal response factor, expressed in height.

The detection limit DL, and the quantification limit QL are calculated according to the following formulae:

DL = 3 R

QL = 10 R

 

5.2.2.4.4.                 Checking a predetermined quantification limit

This approach can be used to validate a quantification value obtained by statistical or empirical approach.

5.2.2.4.4.1.           Scope

This method can be used to check that a given quantification limit is a priori acceptable. It is applicable when the laboratory can procure at least 10 test materials with known quantities of analyte, at the level of the estimated quantification limit.

In the case of methods with a specific signal, not sensitive to matrix effects, the materials can be synthetic solutions whose reference value is obtained by formulation.

In all other cases, wines (or musts) shall be used whose measurand value as obtained by the reference method is equal to the limit to be studied. Of course, in this case the quantification limit of the reference method must be lower than this value.

5.2.2.4.4.2.           Basic protocol and calculation

Analyze n independent test materials whose accepted value is equal to the quantification limit to be checked; n must at least be equal to 10.

  • Calculate the mean of n measurements:

  • Calculate the standard deviation of n measurements:

with results of the measurement of the test material.

The two following conditions must be met:

a) the measured mean quantity must not be different from the predetermined quantification limit QL:

If < 10 then quantification limit QL is considered to be valid.

Note  10 is a purely conventional value relating to the QL criterion.

b) the quantification limit must be other than 0:

If 5 < QL then the quantification limit is other than 0.

A value of 5 corresponds to an approximate value for the spread of the standard deviation, taking into account risk  and risk  to ensure that the QL is other than 0.

This is equivalent to checking that the coefficient of variation for QL is lower than 20%.

Note Remember that the detection limit is obtained by dividing the quantification limit by 3.

Note 2 A check should be made to ensure that the value of SLQ is not too large (which would produce an artificially positive test), and effectively corresponds to a reasonable standard deviation of the variability of the results for the level under consideration. It is up to the laboratory to make this critical evaluation of the value of .

Example: Checking the quantification limit of the determination of malic acid by the enzymatic method.

Estimated quantification limit: 0.1 g.L-1

 

Wine

Values

1

0.1

2

0.1

3

0.09

4

0.1

5

0.09

6

0.08

7

0.08

8

0.09

9

0.09

10

0.08

Mean: 0.090

Standard deviation: 0.008

 

 

First condition: The quantification limit of 0.1 is considered to be valid.

Second condition: The quantification limit is considered to be significantly different from 0.

5.2.3.      Robustness

5.2.3.1.                        Definition

Robustness is the capacity of a method to give close results in the presence of slight changes in the experimental conditions likely to occur during the use of the procedure.

5.2.3.2.                        Determination

If there is any doubt about the influence of the variation of operational parameters, the laboratory can use the scientific application of experiment schedules, enabling these critical operating parameters to be tested within the variation range likely to occur under practical conditions. In practice, these tests are difficult to implement.

5.3.  Section two: systematic error study

5.3.1.      Linearity study

5.3.1.1.                        Normative definition

The linearity of a method is its ability (within a given range) to provide an informative value or results proportional to the amount of analyte to be determined in the test material.

5.3.1.2.                        Reference documents

  • NF V03-110 standard. Intralaboratory validation procedure of an alternative method in relation to a reference method.
  • ISO 11095 Standard, linear calibration using reference materials.
  • ISO 8466-1 Standard, Water quality – Calibration and evaluation of analytical methods and estimation of performance characteristics

5.3.1.3.                        Application

The linearity study can be used to define and validate a linear dynamic range.

This study is possible when the laboratory has stable reference materials whose accepted values have been acquired with certainty (in theory these values should have an uncertainty equal to 0). These could therefore be internal reference materials titrated with calibrated material, wines or musts whose value is given by the mean of at least 3 repetitions of the reference method, external reference materials or certified external reference materials.

In the last case, and only in this case, this study also enables the traceability of the method. The experiment schedule used here could then be considered as a calibration.

In all cases, it is advisable to ensure that the matrix of the reference material is compatible with the method.

Lastly, calculations must be made with the final result of the measurement and not with the value of the signal.

Two approaches are proposed here:

  • An ISO 11095 type of approach, the principle of which consists in comparing the residual error with the experimental error using a Fischer's test. This approach is valid above all for relatively narrow ranges (in which the measurand does not vary by more than a factor 10). In addition, under experimental conditions generating a low reproducibility error, the test becomes excessively severe. On the other hand, in the case of poor experimental conditions, the test will easily be positive and will also lose its relevance. This approach requires good homogeneity of the number of measurements over the entire range studied.
  • An ISO 8466 type of approach, the principle of which consists in comparing the residual error caused by the linear regression with the residual error produced by a polynomial regression (of order 2 for example) applied to the same data. If the polynomial model gives a significantly lower residual error, a conclusion of nonlinearity could be drawn. This approach is appropriate in particular when there is a risk of high experimental dispersion at one end of the range. It is therefore naturally well-suited to analysis methods for traces. There is no need to work with a homogeneous number of measurements over the whole range, and it is even recommended to increase the number of measurements at the borders of the range.

5.3.1.4.                        ISO 11095-type approach

5.3.1.4.1.                 Basic protocol

It is advisable to use a number n of reference materials. The number must be higher than 3, but there is no need, however, to exceed 10. The reference materials should be measured p times, under reproducibility conditions, p shall be higher than 3, a number of 5 being generally recommended. The accepted values for the reference materials are to be regularly distributed over the studied range of values. The number of measurements must be identical for all the reference materials.

Note  It is essential that the reproducibility conditions use a maximum of potential sources of variability, with the risk that the test shows non-linearity in an excessive way.

The results are reported in a table presented as follows:

Reference materials

Accepted reference value material

Measured values

Replica 1

...

Replica j

...

Replica p

1

x1

y11

...

y1j

...

y1p

...

...

...

...

...

...

...

i

xi

yi1

...

yij

...

yip

...

...

...

...

...

...

...

n

xn

yn1

...

ynj

...

ynp

5.3.1.4.2.                 Calculations and results

5.3.1.4.2.1.           Defining the regression model

The model to be calculated and tested is as follows:

where

  • is the replica of the reference material.
  • is the accepted value of the reference material.
  • b is the slope of the regression line.
  • a is the intercept point of the regression line.

represents the expectation of the measurement value of the reference material.

is the difference between yij and the expectation of the measurement value of the reference material.

 

5.3.1.4.2.2.           Estimating parameters

The parameters of the regression line are obtained using the following formulae:

  • mean of p measurements of the ith reference material

  • mean of all the accepted values of n reference materials

  • mean of all the measurements

  • estimated slope b

  • estimated intercept point a

  • regression value associated with the reference material

  • residual

5.3.1.4.2.3.           Charts

The results can be presented and analyzed in graphic form. Two types of charts are used.

  • The first type of graph is the representation of the values measured against the accepted values of reference materials. The calculated overlap line is also plotted.

  • The second graph is the representation of the residual values against the estimated values of the reference materials () indicated by the overlap line.

The graph is a good indicator of the deviation in relation to the linearity assumption: the linear dynamic range is valid if the residual values are fairly distributed between the positive and negative values.

In case of doubt about the linearity of the regression, a Fischer-Snedecor test can be carried out in order to test the assumption: "the linear dynamic range is not valid", in addition to the graphic analysis.

5.3.1.4.2.4.           Test of the linearity assumption

Several error values linked to calibration should be defined first of all: these can be estimated using the data collected during the experiment. A statistical test is then performed on the basis of these results, making it possible to test the assumption of non-validity of the linear dynamic range: this is the Fischer-Snedecor test.

Definitions of errors linked to calibration

These errors are given as a standard deviation, resulting from the square root of the ratio between a sum of squares and a degree of freedom.

Residual error

The residual error corresponds to the error between the measured values and the value given by the regression line.

The sum of the squares of the residual error is as follows:

The number of degrees of freedom is np-2.

The residual standard deviation is then estimated by the formula:

Experimental error

The experimental error corresponds to the reproducibility standard deviation of the experimentation.

The sum of the squares of the experimental error is as follows:

The number of degrees of freedom is np-n.

The experimental standard deviation (reproducibility) is then estimated by the formula:

Note  This quantity is sometimes also noted SR.

Adjustment error

The value of the adjustment error is the experimental error minus the residual error.

The sum of the squares of the adjustment error is:

Or

The number of degrees of freedom is n-2

The standard deviation of the adjustment error is estimated by the formula:

Or

Fischer-Snedecor test

The ratio obeys the Fischer-Snedecor law with the degrees of freedom n-2, np-n.

The calculated experimental value is compared with the limit value: (n-2,np-n), extracted from the Snedecor law table. The value for α used in practice is generally 5%.

If  the assumption of the non-validity of the linear dynamic range is accepted (with a risk of α error of 5%).

If the assumption of the non-validity of the linear dynamic range is rejected

Example: Linearity study for the determination of tartaric acid by capillary electrophoresis. 9 reference materials are used. These are synthetic solutions of tartaric acid, titrated by means of a scale traceable to standard masses.

Ref. material

Ti (ref)

Y1

Y2

Y3

Y4

1

0.38

0.41

0.37

0.4

0.41

2

1.15

1.15

1.12

1.16

1.17

3

1.72

1.72

1.63

1.76

1.71

4

2.41

2.45

2.37

2.45

2.45

5

2.91

2.95

2.83

2.99

2.95

6

3.91

4.09

3.86

4.04

4.04

7

5.91

6.07

5.95

6.04

6.04

8

7.91

8.12

8.01

8.05

7.9

9

9.91

10.2

10

10.09

9.87

Regression line

Line ( y = a + b*x)

b = 1.01565

a = - 0.00798

Errors related to calibration

Residual standard deviation Sres =  0.07161

Standard deviation of experimental reproducibility Sexp = 0.07536

Standard deviation of the adjustment error Sdef = 0.0548

Interpretation, Fischer-Snedecor test

= 0.53 < = 2.37

The assumption of the non-validity of the linear dynamic range is rejected

5.3.1.5.                        ISO 8466-type approach

5.3.1.5.1.                 Basic protocol

It is advisable to use a number n of reference materials. The number must be higher than 3, but there is no need, however, to exceed 10. The reference materials should be measured several times, under reproducibility conditions. The number of measurements may be small at the center of the range studied (minimum = 2) and must be greater at both ends of the range, for which a minimum number of 4 is generally recommended. The accepted values of reference materials must be regularly distributed over the studied range of values.

Note  It is vital that the reproducibility conditions use the maximum number of potential sources of variability.

The results are reported in a table presented as follows:

Reference materials

Accepted value of the reference material

Measured values

Replica 1

Replica 2

Replica j

...

Replica p

1

x1

y11

y12

y1j

...

y1p

...

...

...

...

...

...

 

i

xi

yi1

yi2

 

 

 

...

...

...

...

...

...

 

N

xn

yn1

...

ynj

...

ynp

5.3.1.5.2.                 Calculations and results

5.3.1.5.2.1.           Defining the linear regression model

Calculate the linear regression model using the calculations detailed above.

The residual error of the standard deviation for the linear model Sres can then be calculated using the formula indicated in § 5.3.1.4.2.4.1

5.3.1.5.2.2.           Defining the polynomial regression model

The calculation of the polynomial model of order 2 is given below

The aim is to determine the parameters of the polynomial regression model of order 2 applicable to the data of the experiment schedule.

The purpose is to determine the parameters a, b and c. This determination can generally be computerized using spreadsheets and statistics software.

The estimation formulae for these parameters are as follows:

Once the model has been established, the following values are to be calculated:

  • Regression value associated with the reference material

  • residual

Residual standard deviation of the polynomial model

Comparing residual standard deviations

Calculation of

Then

The value PG is compared with the limit value  given by the Fischer-Snedecor table for a confidence level 1- α and a degree of freedom 1 and (N-3).

Note In general the α risk used is 5%. In some cases the test may be optimistic and a risk of 10% will prove more realistic.

If PG : the nonlinear calibration function does not result in an improved adjustment; for example, the calibration function is linear.

If PG > : the work scope must be as narrow as possible to obtain a linear calibration function: otherwise, the information values from the analyzed samples must be evaluated using a nonlinear calibration function.

Example: Theoretical case.

 

Ti (ref)

Y1

Y2

Y3

Y4

1

35

22.6

19.6

21.6

18.4

2

62

49.6

49.8

53

 

3

90

105.2

103.5

 

 

4

130

149

149.8

 

 

5

205

203.1

202.5

197.3

 

6

330

297.5

298.6

307.1

294.2

Linear regression

y = 1.48.x – 0.0015

= 13.625

Polynomial regression

y = - 0.0015x² + 1.485x – 27.2701

S'res = 7.407

Fischer's test

PG = 10.534 > F(5%) = 10.128

PG>F the linear calibration function cannot be retained

5.3.2.      Specificity

5.3.2.1.                        Normative definition

The specificity of a method is its ability to measure only the compound being searched for.

5.3.2.2.                        Application

In case of doubt about the specificity of the tested method, the laboratory can use experiment schedules designed to check its specificity. Two types of complementary experiments are proposed here that can be used in a large number of cases encountered in the field of oenology.

  • The first test is the standard addition test. It can be used to check that the method measures all the analyte.
  • The second test can be used to check the influence of other compounds on the result of the measurement.

5.3.2.3.                        Procedures

5.3.2.3.1.                 Standard addition test

5.3.2.3.1.1.           Scope

This test can be used to check that the method measures all the analyte.

The experiment schedule is based on standard additions of the compound being searched for. It can only be applied to methods that are not sensitive to matrix effects.

5.3.2.3.1.2.           Basic protocol

This consists in finding a significant degree of added quantities on test materials analyzed before and after the additions.

Carry out variable standard additions on n test materials. The initial concentration in analyte of test materials, and the standard additions are selected in order to cover the scope of the method. These test materials must consist of the types of matrices called for routine analysis. It is advised to use at least 10 test materials.

The results are reported in a table presented as follows:

Test material

 

Quantity before addition
(x)

Quantity added

(v)

Quantity after addition

(w)

Quantity found (r)

1

x1

v1

w1

r1 = w1 x1

...

...

...

...

...

i

xi

vi

wi

ri = wi – xi

...

...

...

...

...

n

Xn

Vn

wn

rp = wn – xn

 

Note 1 An addition is made with a pure standard solution. It is advised to perform an addition of the same order as the quantity of the test material on which it is carried out. This is why the most concentrated test materials must be diluted to remain within the scope of the method.

Note 2 It is advised to prepare the additions using independent standard solutions, in order to avoid any systematic error.

Note 3 The quality of values x and w can be improved by using several repetitions.

5.3.2.3.1.3.           Calculations and results

The principle of the measurement of specificity consists in studying the regression line r = a + b.v and checking that slope b is equivalent to 1 and that intercept point a is equivalent to 0.

 

5.3.2.3.1.3.1.    Study of the regression line r = a + b.v

The parameters of the regression line are obtained using the following formulae:

  • mean of the added quantities

  • mean of the quantities found

  • estimated slope b

  • estimated intercept point a

  • regression value associated with the reference material

  • residual standard deviation

  • standard deviation on the slope

  • standard deviation on the intercept point

5.3.2.3.1.3.2.    Analysis of the results

The purpose is to conclude on the absence of any interference and on an acceptable specificity. This is true if the overlap line r = a + bv is equivalent to the line y = x.

To do so, two tests are carried out:

  • Test of the assumption that slope b of the overlap line is equal to 1.
  • Test of the assumption that intercept point a is equal to 0.

These assumptions are tested using a Student test, generally associated with a risk of error of 1%. A risk of 5% can prove more realistic in some cases.

Let [dof; 1%] be a Student bilateral variable associated with a risk of error of 1% for a number of degrees of freedom (dof).

Step 1: calculations

Calculation of the comparison criterion on the slope at 1

Calculation of the comparison criterion on the intercept point at 0

Calculation of the Student critical value: Tcritical, bilateral[ p-2; 1%]

Step 2: interpretation

  • If is lower than , then the slope of the regression line is equivalent to 1
  • If  is lower than , then the intercept point of the regression line is equivalent to 0.

If both conditions are true, then the overlap line is equivalent = y = x, and the method is deemed to be specific.

Note 1 Based on these results, a mean overlap rate can be calculated to quantify the specificity. In no case should it be used to "correct" the results. This is because if a significant bias is detected, the alternative method cannot be validated in relation to an efficiency rate of 100%.

Note 2 Since the principle of the test consists in calculating a straight line, at least three levels of addition have to be taken, and their value must be correctly chosen in order to obtain an optimum distribution of the points.

5.3.2.3.1.3.3.    Overlap line graphics

Example of specificity

 

5.3.2.3.2.                 Study of the influence of other compounds on the measurement result

5.3.2.3.2.1.           Scope

If the laboratory suspects the interaction of compounds other than the analyte, an experiment schedule can be set up to test the influence of various compounds. The experiment schedule proposed here enables a search for the influence of compounds defined a priori: thanks to its knowledge of the analytical process and its know-how, the laboratory should be able to define a certain number of compounds liable to be present in the wine and to influence the analytical result.

5.3.2.3.2.2.           Basic protocol and calculations

Analyze n wines in duplicate, before and after the addition of the compound suspected of having an influence on the analytical result; n must at least be equal to 10.

The mean values Mxi of the 2 measurements and made before the addition shall be calculated first, then the mean values Myi of the 2 measurements and made after the addition, and finally the difference between the values and .

The results of the experiment can be reported as indicated in the following table:

Samples

x: Before addition

y: After addition

Means

Difference

Rep1

Rep2

Rep1

Rep2

x

y

d

1

x1

x’1

y1

y’1

Mx1

My1

d1 = Mx1-My1

...

...

...

...

...

...

...

...

i

xi

x’i

yi

y’i

Mxi

Myi

di = Mxi-Myi

...

...

...

...

...

...

...

...

n

xn

x’n

yn

y’n

Mxn

Myn

dn =  Mxn-Myn

The mean of the results before addition

The mean of the results after addition

Calculate the mean of the differences

Calculate the standard deviation of the differences

Calculate the Z-score

5.3.2.3.2.3.           Interpretation

If the is 2, the added compound can be considered to have a negligible influence on the result of analysis with a risk of 5%.

If the is 2, the added compound can be considered to influence the result of analysis with a risk of 5%.

Note Interpreting the is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

Example: Study of the interaction of compounds liable to be present in the samples, on the determination of fructose glucose in wines by Fourier transform infrared spectrophotometry (FTIR).

Before addition

+ 250 mg.L-1 potassium sorbate

+ 1 g. L-1 salicylic acid

Differences

vin

rep1

rep2

rep1

rep2

rep1

rep2

sorbate diff

salicylic diff

1

6.2

6.2

6.5

6.3

5.3

5.5

0.2

-0.8

2

1.2

1.2

1.3

1.2

0.5

0.6

0.05

-0.65

3

0.5

0.6

0.5

0.5

0.2

0.3

-0.05

-0.3

4

4.3

4.2

4.1

4.3

3.8

3.9

-0.05

-0.4

5

12.5

12.6

12.5

12.7

11.5

11.4

0.05

-1.1

6

5.3

5.3

5.4

5.3

4.2

4.3

0.05

-1.05

7

2.5

2.5

2.6

2.5

1.5

1.4

0.05

-1.05

8

1.2

1.3

1.2

1.1

0.5

0.4

-0.1

-0.8

9

0.8

0.8

0.9

0.8

0.2

0.3

0.05

-0.55

10

0.6

0.6

0.5

0.6

0.1

0

-0.05

-0.55

Potassium sorbate

Md  =

0.02

 

Sd =

0.086

 

  =

0.23

 <2

 

 

 

Salicylic acid

Md =

-0.725

 

Sd =

0.282

 

=

2.57

 >2

In conclusion, it can be stated that potassium sorbate does not influence the determination of fructose glucose by the FTIR gauging studied here. On the other hand, salicylic acid has an influence, and care should be taken to avoid samples containing salicylic acid, in order to remain within the scope of validity for the gauging under study.

5.3.3.      Study of method accuracy

5.3.3.1.                        Presentation of the step

5.3.3.1.1.                 Definition

Correlation between the mean value obtained with a large series of test results and an accepted reference value.

5.3.3.1.2.                 General principles

When the reference value is output by a certified system, the accuracy study can be regarded a traceability link. This applies to two specific cases in particular:

  • Traceability to certified reference materials: in this case, the accuracy study can be undertaken jointly with the linearity and calibration study, using the experiment schedule described for that study.
  • Traceability to a certified interlaboratory comparison analysis chain.

The other cases, i.e. which use references that are not based on certified systems, are the most widespread in routine oenological laboratories. These involve comparisons:

  • Comparison with a reference method
  • Comparison with the results of an uncertified interlaboratory comparison analysis chain.
  • Comparison with internal reference materials, or with external uncertified reference materials.

5.3.3.1.3.                 Reference documents

  • NF V03-110 Standard. intralaboratory validation procedure for an alternative method in relation to a reference method.
  • NF V03-115 Standard, Guide for the use of reference materials.
  • ISO 11095 Standard, linear calibration using reference materials.
  • ISO 8466-1 Standard. Water quality – Calibration and evaluation of analytical methods and estimation of performance characteristics
  • ISO 57025 Standard, Exactitude of results and methods of measurement

5.3.3.2.                        Comparison of the alternative method with the OIV reference method

5.3.3.2.1.                 Scope

This method can be applied if the laboratory uses the OIV reference method, or a traced, validated method, whose performance quality is known and meets the requirements of the laboratory’s customers.

To study the comparative accuracy of the two methods, it is advisable first of all to ensure the quality of the repeatability of the method to be validated, and to compare it with the reference method. The method for carrying out the repeatability comparison is described in the chapter on repeatability.

 

5.3.3.2.2.                 Accuracy of the alternative method compared with the reference method

5.3.3.2.2.1.           Definition

Accuracy is defined as the closeness of agreement between the values obtained by the reference method and that obtained by the alternative method, independent of the errors of precision of the two methods.

5.3.3.2.2.2.           Scope

The accuracy of the alternative method in relation to the reference method is established for a field of application in which the repeatabilities of the two methods are constant.

In practice, it is therefore often advisable to divide the analyzable range of values into several sections or "range levels" (2 to 5), in which we may reasonably consider that the repeatabilities of the methods are comparable to a constant.

5.3.3.2.2.3.           Basic protocol and calculations

In each range level, accuracy is based on a series of n test materials with concentration values in analyte covering the range level in question. A minimum number of 10 test materials is required to obtain significant results.

Each test material is to be analyzed in duplicate by the two methods under repeatable conditions.

A calculation is to be made of the mean values of the 2 measurements et made using the alternative method and the mean values of the 2 measurements et made using the reference method, then the difference di is to be calculated between the values and .

The results of the experiment can be reported as in the following table:

Test material

x: Alternative method

y: Reference method

Means

Difference

Rep1

Rep2

Rep1

Rep2

x

y

d

1

x1

x’1

y1

y’1

Mx1

My1

d1 = Mx1 - My1

...

...

...

...

...

...

...

...

i

xi

x’i

yi

y’i

Mxi

Myi

di = Mxi - Myi

...

...

...

...

...

...

...

...

n

xn

x’n

yn

y’n

Mxn

Myn

dn =  Mxn - Myn

The following calculations are to be made

- The mean of the results for the alternative method

  • The mean of the results for the reference method

  • Calculate the mean of the differences

  • Calculate the standard deviation of the differences

  • Calculate the

5.3.3.2.2.4.           Interpretation

  • If the is lower than or equal to 2.0, it can be concluded that the accuracy of one method in relation to the other is satisfactory, in the range level under consideration, with a risk of error α = 5%.
  • If the is higher than 2.0, it can be concluded that the alternative method is not accurate in relation to the reference method, in the range level under consideration, with a risk of error α = 5%.

Note Interpreting the is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

Example: Study of the accuracy of FTIR gauging to determine glucose and fructose in relation to the enzymatic method. The first range level covers the scale from 0 to 5 g.L-1 and the second range level covers a scale from 5 to 20 g.L-1.

Wine

FTIR 1

IRTF2

Enz 1

Enz 2

di

1

0

0.3

0.3

0.2

-0.1

2

0.2

0.3

0.1

0.1

0.2

3

0.6

0.9

0.0

0.0

0.7

4

0.7

1

0.8

0.7

0.1

5

1.2

1.6

1.1

1.3

0.2

6

1.3

1.4

1.3

1.3

0.0

7

2.1

2

1.9

2.1

0.0

8

2.4

0

1.1

1.2

0.1

9

2.8

2.5

2.0

2.6

0.3

10

3.5

4.2

3.7

3.8

0.1

11

4.4

4.1

4.1

4.4

0.0

12

4.8

5.4

5.5

5.0

-0.2

 

 

 

 

 

 

Md

0.13

 

 

 

 

Sd

0.23

 

 

 

 

Zscore

0.55

 < 2

 

 

 

Wine

FTIR 1

IRTF2

Enz 1

Enz 2

di

1

5.1

5.4

5.1

5.1

0.1

2

5.3

5.7

5.3

6.0

-0.2

3

7.7

7.6

7.2

7.0

0.6

4

8.6

8.6

8.3

8.5

0.2

5

9.8

9.9

9.1

9.3

0.6

6

9.9

9.8

9.8

10.2

-0.1

7

11.5

11.9

13.3

13.0

-1.4

For the two range levels, the is lower than 2. The FTIR gauging for the determination of fructose glucose studied here, can be considered accurate in relation to the enzymatic method.

5.3.3.3.                        Comparison by interlaboratory tests

5.3.3.3.1.                 Scope

Interlaboratory tests are of two types:

  • Collaborative studies relate to a single method. These studies are carried out for the initial validation of a new method, mainly in order to define the standard deviation of interlaboratory reproducibility (method). The mean m could also be given.
  • Interlaboratory comparison analysis chains, or aptitude tests. These tests are carried out for the validation of a method adopted by the laboratory, and the routine quality control (see § 5.3.3.3). The resulting value is the interlaboratory mean m, as well as the standard interlaboratory reproducibility and intermethod deviation SRinter.

By participating in an analysis chain, or in a collaborative study, the laboratory can exploit the results in order to study the accuracy of a method, in order to ensure its validation first of all, and its routine quality control.

If the interlaboratory tests are carried out within the framework of a certified organization, this comparison work can be used for method traceability.

5.3.3.3.2.                 Basic protocol and calculations

To obtain a sufficient comparison, it is recommended to use a minimum number of 5 test materials over the period.

For each test material, two results are provided:

  • The mean of all the laboratories with significant results m
  • The standard deviation for interlaboratory reproducibility 

The test materials are analyzed with p replicas by the laboratory, these replicas being carried out under repeatable conditions. p must at least be equal to 2.

In addition, the laboratory must be able to check that the intralaboratory variability (intralaboratory reproducibility) is lower than the interlaboratory variability (interlaboratory reproducibility) given by the analysis chain.

For each test material, the laboratory calculates the , given by the following formula:

The results can be reported as indicated in the following table:

Test material

Rep1

...

Rep j

...

Rep p

Lab mean

Chain mean

Standard deviation

Zscore

1

x11

...

x1j

...

x1p

m1

SR-inter(1)

...

...

...

...

...

...

...

...

...

...

i

xi1

...

xij

...

xip

mi

SR-inter(i)

...

...

...

...

...

...

...

...

...

...

n

xn1

...

xnj

...

xnp

mn

SR-inte(n)

5.3.3.3.3.                 Interpretation

If all the results are lower than 2, the results of the method being studied can considered identical to those obtained by the laboratories having produced significant results.

Note Interpreting the is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

Example: An interlaboratory analysis chain outputs the following results for the free sulfur dioxide parameter, on two samples.

Samples

Lab mean

Chain mean

Standard deviation

1

34

34

33

34

33.75

32

6

0.29 <2

2

26

27

26

26

26.25

24

4

0.56 <2

It can be concluded that on these two samples, the comparison with the analysis chain is satisfactory.

5.3.3.4.                        Comparison with reference materials

5.3.3.4.1.                 Scope

In situations where there is no reference method (or any other method) for a given parameter, and the parameter is not processed by the analysis chains, the only remaining possibility is comparison of the results of the method to be validated with accepted internal or external material reference values.

The reference materials, for example, could be synthetic solutions established with class-A glassware, and/or calibrated metrology apparatus.

In the case of certified reference materials, the comparison constitutes the traceability value, and can be carried out at the same time as the gauging and linearity study.

5.3.3.4.2.                 Basic protocol and calculations

It is advisable to have n reference materials for a given range level, in which it can be reasonably estimated that repeatability is comparable to a constant; n must at least be equal to 10.

Analyze in duplicate each reference material.

Calculate the mean values for the 2 measurements and carried out using the alternative method.

Define the accepted value for the reference material.

The results can be reported as indicated in the following table:

Reference material

x: Alternative method

T: Accepted value of the reference material

Difference

Rep1

Rep2

Mean x

d

1

x1

x’1

Mx1

T1

d1 = Mx1-T1

...

 

 

...

...

...

i

xi

x’i

Mxi

Ti

di = Mxi-Ti

...

 

 

...

...

...

n

xn

x’n

Mxn

Tn

dn =  Mxn-Tn

The mean of the results of the alternative method

The mean of the accepted values of reference materials

Calculate the mean of the differences

Calculate the standard deviation of the differences

Calculate the Z-score

5.3.3.4.3.                 Interpretation

  • If the is lower than or equal to 2.0, it can be concluded that the accuracy of the alternative method in relation to the accepted values for the reference material is good on the range level under consideration.
  • If is higher than 2.0, it can be concluded that the alternative method is not accurate in relation to the accepted values for the reference materials in the range level under consideration.

Note Interpreting the is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

Example: There is no reference method to compare the results of the analysis of Ethyl-4 Phenol (4-EP) by Gas chromatography coupled with mass spectrometry (GC-MS). The results are compared with the accepted values for reference materials, consisting of synthetic solutions formulated by traced equipment.

Test apparatus

Ti (ref)

Y1

Y2

Y3

Y4

My

di

1

4.62

6.2

6.56

4.9

5.7

5.8

1.2

2

12.3

15.1

10.94

12.3

11.6

12.5

0.2

3

24.6

24.5

18

25.7

27.8

24.0

-0.6

4

46.2

48.2

52.95

46.8

35

45.7

-0.5

5

77

80.72

81.36

83.2

74.5

79.9

2.9

6

92.4

97.6

89

94.5

99.5

95.2

2.8

7

123.2

126.6

129.9

119.6

126.9

125.8

2.6

8

246.4

254.1

250.9

243.9

240.4

247.3

0.9

9

385

375.8

366.9

380.4

386.9

377.5

-7.5

10

462

467.5

454.5

433.3

457.3

453.2

-8.9

Md = -0.7

Sd = .4.16

= 0.16

Given these results, the values obtained by the analysis method for 4-EP by GC-MS can be considered accurate compared with the accepted values of reference materials.

5.4.  Section three: random error study

5.4.1.      General principle

Random error is approximated using precision studies. Precision is calculated used a methodology that can be applied under various experimental conditions, ranging between those of repeatability, and those of reproducibility, which constitute the extreme conditions of its measurement.

The precision study is one of the essential items in the study of the uncertainty of measurement.

5.4.2.      Reference documents

  • ISO 5725 Standard, Exactitude of results and methods of measurement
  • NF V03-110 Standard, Intralaboratory validation procedure for an alternative method in relation to a reference method.

5.4.3.      Precision of the method

5.4.3.1.                        Definition

Closeness of agreement between independent test results obtained under prescribed conditions.

Note 1  Precision depends only on the distribution of the random errors and has no relation with the true or specified value.

Note 2 Expressing the measurement of precision is based on the standard deviation of the test results.

Note 3 The term "independent test results" refers to results obtained such that they are not influenced by a previous result on the same or similar test material. Quantitative measurements of precision are critically dependent on the prescribed conditions. Repeatability and reproducibility conditions are particular sets of extreme conditions.

In practice, precision refers to all the experimental conditions ranging between the conditions of repeatability and those of reproducibility.

5.4.3.2.                        Scope

The protocols and calculations are detailed below, from the general theoretical case to the specific cases of repeatability and reproducibility. This exhaustive approach should make it possible to apply the precision study in most laboratory situations.

The precision study can be applied a priori without difficulty to every quantitative method.

In many cases, precision is not constant throughout the validity range for the method. In this case, it is advisable to define several sections or "range levels", in which we may reasonably consider that the precision is comparable to a constant. The calculation of precision is to be reiterated for each range level.

5.4.3.3.                        General theoretical case

5.4.3.3.1.                 Basic protocol and calculations

5.4.3.3.1.1.           Calculations with several test materials

n test materials are analyzed over a relatively long period of time with several replicas, being the number of replicas for the test material. The properties of the test materials must maintain constant throughout the period in question.

For each replica, the measurement can be made with K repetitions, (we do not take into account the case here where the number of repetitions K can vary from one test material to the other, which would complicate the calculations even more).

The total number of replicas must be higher than 10, distributed over all the test materials.

The results can be reported as indicated in the following table, (case in which K = 2)

 

Replicas

1

...

j

p1

pi

pn

Test materials.

1

x11

x’11

...

...

x1j

x’1j

x1p1

x’1p1

 

 

...

 

 

 

 

 

 

 

 

 

 

i

xi1

x’i1

...

...

xij

x’ij

...

...

xipi

x’ipi

...

 

 

 

 

 

 

 

 

 

 

n

xn1

x’n1

...

...

xnj

x’nj

...

...

...

...

xnpn

x’npn

In this situation, the standard deviation of total variability (or standard deviation of precision ) is given by the general expression:

where:

variance of the mean of repeated replicas of all test materials.

variance of the repeatability of all the repetitions.

  • If the test materials were analyzed in duplicate with each replica (K = 2), the expression becomes:

  • When only one measurement of the test material has been carried out with each replica (K = 1), the variance of repeatability is null, the expression becomes:

  • Calculation of

The mean of the two replicas and is:

For each test material, the mean of n replicas is calculated:

The number of different measurements n is the sum of

The variance is then given by the following equation

Note  This variance can also be calculated using the variances of variability of each test material: (). The following relation is then used (it is strictly equivalent to the previous one):

 

  • Calculation of

The variance of repeatability is calculated as a conventional repeatability equation with n test materials in duplicate. According to the calculation of repeatability discussed in the section entitled "repeatability", for K = 2 the variance of repeatability is:

Precision v is calculated according to the formula:

The value of precision v means that in 95% of the cases, the difference between two values obtained by the method, under the conditions defined, will be lower than or equal to v.

Note 1 The use and interpretation of these results is based on the assumption that the variations obey a normal law with a 95% confidence rate.

Note 2 One can also define a precision of 99% with

5.4.3.3.1.2.           Calculations with 1 test material

In this situation, the calculations are simpler. It is advisable to carry out p measurement replicas of the test material, if necessary with a repetition of the measurement on each replica. p must at least be equal to 10.

In the following calculations, the measurement is considered to be carried out in duplicate with each replica.

- The variance is then given by the following equation:

where:

is the mean of the two repetitions of replica i

p is the number of replicas

is the mean of all the replicas

  • The variance is then given by the following equation:

where: difference between the two repetitions of replica i

5.4.3.4.                        Repeatability

5.4.3.4.1.                 Definitions

Repeatability is the closeness of agreement between mutually-independent analysis results obtained with the method in question on the same wine, in the same laboratory, with the same operator using the same equipment, within a short period of time.

These experimental conditions will be called conditions of repeatability.

The value of repeatability r is the value below which the absolute difference between two results of the same analysis is considered to be located, obtained under the conditions of repeatability defined above, with a confidence level of 95%.

The repeatability standard deviation Sr is the standard deviation for the results obtained under the conditions of repeatability. It is a parameter of the dispersion of the results, obtained under conditions of repeatability.

5.4.3.4.2.                 Scope

A priori, the repeatability study can be applied without difficulty to every quantitative method, insofar as the repeatability conditions can be observed.

In many cases, repeatability is not constant throughout the range of validity of the method. It is therefore advisable to define several sections or "range levels", in which we may reasonably consider that the repeatability is comparable to a constant. The repeatability calculation is then to be reiterated for each range level.

5.4.3.4.3.                 Basic protocol and calculations

5.4.3.4.3.1.           General case

The number of test materials may vary in relation to the NUMBER of replicas. In practice, we consider that the number of measurements of all test materials must be higher than 20. It is not necessary for the repeatability conditions to be maintained from one test material to another, but all the replicas carried out on the same test material must be carried out under these repeatability conditions.

Repeatability remains a special case of the precision calculation . The part is naturally equal to 0 (only one measurement with each replica), and the calculation is the same as the calculation of

The value r means that in 95% of the cases, the difference between two values acquired under repeatable conditions will be lower than or equal to r.

5.4.3.4.3.2.           Particular case applicable to only 1 repetition

In practice, the most current situation for automated systems is the analysis of test material with only one repetition. It is advisable to use at least 10 materials in order to reach the 20 measurements required. The two measurement replicas of the same test material must be carried out under repeatable conditions.

In this precise case, the calculation of is simplified and becomes:

in which:

= the repeatability standard deviation

p = the number of test materials analyzed in duplicate

= the absolute differences between duplicates

Repeatability r is calculated according to the formula:

r = 2.8 Sr

 

Example: For the alternative determination method of the free sulfur dioxide in question, and for a range of measurements from 0 to 50 mg/l, the operator will seek at least 10 samples with regularly distributed concentrations ranging between these values.

Sample no.

xi

(in mg/l)

x’i

(in mg/l)

Wi

(absolute value)

1

14

14

0

2

25

24

1

3

10

10

0

4

2

3

1

5

35

35

0

6

19

19

0

7

23

23

0

8

27

27

0

9

44

45

1

10

30

30

0

11

8

8

0

12

48

46

2

Example: Using the values given in the table above, the following results are obtained:

Q = 12

Sr = 0.54 mg/l

R = 1.5 mg/l

This result can be used to state that, with a probability of 95%, the results obtained by the method under study will have a repeatability rate lower than 1.5 mg/l.

5.4.3.4.4.                 Comparison of repeatability

5.4.3.4.4.1.           Determination of the repeatability of each method

To estimate the performance of a method, it can be useful to compare its repeatability with that of a reference method.

Let be the repeatability standard deviation of the alternative method, and the repeatability standard deviation of the reference method.

The comparison is direct. If the value of repeatability of the alternative method is lower than or equal to that of the reference method, the result is positive. If it is higher, the laboratory must ensure that the result rests compliant with the specification that it accepted for the method concerned. In the latter case, it may also apply a Fischer-Snedecor test to know if the value found for the alternative method is significantly higher than that of the reference method.

5.4.3.4.4.2.           Fischer-Snedecor test

Calculate the ratio:

Use the critical Snedecor value with a risk α equal to 0.05 corresponding to the Fischer variable with a confidence level 1 α, in which ν1 = n(x)-n, and ν2 = n(z)-m degrees of freedom: F(N(x)-n, N(y)-m, 1- α). In the case of a calculated repeatability with only one repetition on p test materials for the alternative method, and q test materials for the reference method, the Fischer variable will have as a degree of freedom ν1 = p, and ν2 = Q, i.e.: F(p, Q, 1- α).

Interpreting the test:

1/ the repeatability value of the alternative method is significantly higher than that of the reference method.

2/ , we cannot state that the repeatability value of the alternative method is significantly higher than that of the reference method.

 

Example: The value of the repeatability standard deviation found for the determination method of free sulfur dioxide is:

 

Sr = 0.54 mg/l

 

The laboratory carried out the determination on the same test materials using the OIV reference method. The value of the repeatability standard deviation found in this case is:

Sref = 0.39 mg/l

 

= 12

= 12

= 2.69 > 1.93

The value obtained is lower than the value ; we cannot state that the repeatability value of the alternative method is significantly higher than that of the reference method.

5.4.3.5.                        Intralaboratory reproducibility

5.4.3.5.1.                 Definition

Intralaboratory reproducibility is the closeness of agreement between the analysis results obtained with the method under consideration on the same wine, in the same laboratory, with the same operator or different operators using from the different gauging curves, on different days.

5.4.3.5.2.                 Scope

Reproducibility studies can be implemented on quantitative methods, if the time of analysis is reasonably limited, and if the capacity exists to keep at least one test material stable over time.

In many cases, reproducibility is not constant throughout the validity range of the method. In this case, it is advisable to define several sections or "range levels", in which it can be reasonably considered that reproducibility is comparable to a constant. The reproducibility calculation is then to be reiterated for each range level.

5.4.3.5.3.                 Basic protocol and calculations

The laboratory chooses one or more stable test materials. It applies the method regularly for a period equal to at least one month and keeps the results obtained (material i, replica j). A minimum of 5 replicas is recommended for each test material, the total minimum number of replicas being 10. The replicas can be analyzed in duplicate.

The calculation of precision fully applies to the calculation of reproducibility, integrating if the measurements are carried out in duplicate.

Reproducibility R is calculated according to the formula:

R = 2.8 SR

The value R means that in 95% of the cases, the difference between two values acquired under reproducibility conditions will be lower than or equal to R.

Example: Reproducibility study of the determination of the sorbic acid in wines by steam distillation and reading by absorption at 256 Nm.

Two different sorbated wines were kept for a period of 3 months. The determination of the sorbic acid was carried out at regular intervals over this period, with repetition of each measurement.

Test material 1

Test material 2

Replicas

x1

x2

x1

x2

1

122

125

140

139

2

123

120

138

137

3

132

130

139

141

4

121

115

143

142

5

130

135

139

139

6

135

142

135

138

7

137

135

139

139

8

130

125

145

145

9

123

130

138

137

10

112

115

135

134

11

131

128

146

146

12

 

 

137

138

13

 

 

146

147

14

 

 

145

148

15

 

 

130

128

n = 2

p1 = 11

p2 = 15

n = 26

SR = 6.35

R = 17.8

 

  1. Quality control of analysis methods (IQC)

 

6.1.  Reference documents

  • Resolution OIV Œno 19/2002: Harmonized recommendations for internal quality control in analysis laboratories.
  • CITAC/EURACHEM: Guide for quality in analytical chemistry, 2002 Edition
  • Standard NF V03-115, Guide for the use of reference materials

6.2.  General principles

It is recalled that an analysis result can be affected two types of error: systematic error, which translates into bias, and random error. For series analyses, another type of error can be defined, which can be due to both systematic error and random error: this is the series effect, illustrated for example by the deviation of the measuring system during a series.

The IQC is designed to monitor and control these three errors.

6.3.  Reference materials

The IQC is primarily based on exploiting the measurement results for reference materials. The choice and constitution of the materials are therefore essential steps that it must be controlled in order to provide an efficient basis for the system.

A reference material is defined by two parameters:

  • Its matri
  • The assignment of its reference value

Several cases are possible; the cases encountered in oenology are summarized in the following two-dimensional table:

Doped wine

A doped wine is a wine with an artificial addition of an analyte.

This method is applicable when the base wine is completely free of analyte. These types of materials are suitable for oenological additives that are not native to the wine. If doping is applied with a component native to the wine, the matrix can no longer be considered natural. Doping must be carried out according to metrological rules. The value obtained is prone to uncertainty.

This case can be used to monitor the precision of the method, as well as its accuracy in a point. It can be applied to methods sensitive to matrix effects for non-native compounds of the wine, but not in the case of native compounds of the wine.

In practice, this involves conditioned wine samples doped and/or chemically stabilized as proposed by organizations. These materials cannot claim to constitute a natural matrix. The reference values are generally generated by an analysis chain.

This case can be used to monitor the precision of the method, as well as its accuracy in a point compared with the external standard. This has a traceability value in this point if the organization supplying the samples has been approved for the preparation of the reference material in question. It cannot be applied to methods sensitive to matrix effects.

The measurement is carried out 3 times with the reference method, the value retained is the mean of the 3 results, insofar as they remain within an interval lower than the repeatability of the method.

This case can be used to monitor the precision of a method, and to check its accuracy in a point compared with the reference method. It can be applied to methods sensitive to matrix effects for non-native compounds of the wine, but not in the case of native compounds of the wine.

The reference value is measured using the method to be checked. The material is measured over 10 repetitions, and a check is made to ensure that the differences between these values are lower than the repeatability value; the most extreme values can be withdrawn, up to a limit of two values withdrawn. To ensure the consistency of the values obtained during the 10 repetitions, the series should be checked using control materials established during a previous session, placed at the start and end of the series.

This case can only be used to monitor the precision of the method; accuracy must be monitored using another method. It can be applied to methods sensitive to matrix effects for non-native compounds of the wine, but not in the case of native compounds of the wine.

Natural matrix (wine etc.)

Natural matrices a priori constitute the most interesting reference materials because they avoid any risk of matrix effect for methods that are not perfectly specific.

Not applicable

The external value has been determined on the wine by an interlaboratory analysis chain.

Certain organizations propose conditioned wine samples whose values have been determined in this way. However, in certain cases, the wines presented in this way may have been doped and/or chemically stabilized, which means the matrix may be affected.

This case can be used to monitor the precision of a method, and to check its accuracy in a point compared with the external value. This has traceability value in this point if the analysis chain has been accredited. It can be applied to methods sensitive to matrix effects.

The measurement is carried out 3 times with the reference method, the selected value is the mean of the 3 results, insofar as they remain within an interval lower than the repeatability of the method.

This case can be used to monitor the precision of a method, and to check its accuracy in a point compared with the reference method. It can be applied to methods sensitive to matrix effects.

The reference value is measured by the method to be checked. The material is measured over 10 repetitions, and a check is to be made that the differences between these values are lower than the repeatability value; the most extreme values can be withdrawn, up to a limit of two values. To ensure the consistency of the values obtained over the 10 repetitions, this series is to be checked on the one hand by control materials established during a previous session, placed at the start and end of the series. The value obtained can also be compared with the value obtained by the reference method (during the 3 repetitions for example). The difference between the two values must remain lower than the calculated accuracy of the alternative method compared with the reference method.

This case is of interest in particular when a method produces a random reproducible error specific to each sample, in particular because of the non-specificity of the measured signal. This error is often minimal and lower than the uncertainty, but can generate a systematic error if the method is adjusted on a single value. This can be used to monitor the precision of the method, accuracy must be monitored using another approach. This is notably the case of the FTIR.

Synthetic solution

Synthetic solutions can be used to constitute reference materials quite easily. They are not compatible with methods with non-specific signals, and that are sensitive to matrix effects.

The solution must be produced using metrological rules. It is recalled that the formulation value obtained is prone to uncertainty.

The application of such a case can be used to monitor the precision of the method, as well as its accuracy in a point in relation to a calibrated reference.

The organization supplying the solution must provide guarantees about its quality and be certified if possible. The reference values will be accompanied by an uncertainty value at a given confidence level.

This case can be used to monitor the precision of a method, and to check its accuracy in a point compared with the external value. This has traceability value in this point if the supplier organization is approved for the preparation of reference material in question. It cannot be applied to methods sensitive to matrix effects.

If the synthetic solution has not been obtained with a calibrated material, the reference value can be determined by analyzing the synthetic solution using the reference method. The measurement is to be carried out at least 3 times. The selected value is the mean of the 3 results, insofar as they remain within an interval lower than the repeatability of the method. If necessary, the operator can check the consistency of the results obtained with the formulation value for the solution.

This case can be used to monitor the precision of a method, and to check its accuracy in a point compared with the reference method. It cannot be applied to methods sensitive to matrix effects.

The reference value is measured by the method to be checked. The material is measured over 10 repetitions, and a check will be made that the differences between these values are lower than the repeatability value; the most extreme values can be withdrawn, up to a limit of two values. To ensure the consistency of the values obtained over the 10 repetitions, the series is to be checked using control materials established during a previous session, placed at the start and end of the series.

This case can be used to monitor only the precision of the method, accuracy must be monitored using another approach.

Matrix

 

Reference value

Value obtained by formulation

External value to the laboratory

Value obtained by a reference method

Value obtained by the method to be checked

The use of the instrument value as a reference value does not control accuracy. An alternative approach must be set up.

6.4.  Checking the analytical series

6.4.1.      Definition

An analytical series is a series of measurements carried out under repeatable conditions.

For a laboratory that mainly uses the analytical series method of analysis, a check must be made to ensure the instantaneous adjustment of the measuring instrument and its stability during the analytical series is correct.

Two complementary approaches are possible:

  • the use of reference materials (often called by extension "control materials”)
  • the use of an internal standard, in particular for separative methods.

6.4.2.      Checking accuracy using reference materials

Systematic error can be checked by introducing reference materials, the reference value of which has been assigned using means external to the method being checked.

 

The measured value of the reference material is associated with a tolerance limit, inside which the measured value is accepted as being valid. The laboratory defines tolerance values for each parameter and for each analytical system. These values are specific to the laboratory.

 

The control materials must be selected so that their reference values correspond to the levels of the values usually found for a given parameter. If the scale of measurement is broad, and the uncertainty of measurement is not constant on the scale, several control materials should be used to cover the various range levels.

6.4.3.      Intraseries precision

When the analytical series are rather long, there is a risk of drift of the analytical system. In this case, intraseries precision must be checked using the same reference material positioned at regular intervals in the series. The same control materials as those used for accuracy can be used.

 

The variation in the measured values for same reference material during the series should be lower than the repeatability value r calculated for a confidence level of 95%.

 

Note  For a confidence level of 99%, a value of 3.65.Sr can be used.

6.4.4.      Internal standard

Certain separative methods enable the introduction of an internal standard into the product to be analyzed.

In this case, an internal standard should be introduced with calibrated material with a known uncertainty of measurement.

The internal standard enables a check to be made both of intraseries accuracy and precision. It should be noted that a drift affects the signals of the analyte and of the internal standard in equal proportions; since the value of the analyte is calculated with the value of the signal of the internal standard, the effect of the drift is cancelled.

The series will be validated if the internal standards are inside the defined tolerance values.

6.5.  Checking the analysis system

6.5.1.      Definition

This concerns an additional check to the series check. It differs from the latter in that it compiles values acquired over long time scales, and/or compares them with values resulting from other analysis systems.

Two applications will be developed:

  • Shewhart charts to monitor the stability of the analysis system
  • Internal and external comparison of the analysis system

6.5.2.      Shewhart chart

Shewhart charts are graphic statistical tools used to monitor the drift of measurement systems, by the regular analysis, in practice under reproducibility conditions, of stable reference materials.

6.5.2.1.                        Data acquisition

A stable reference material is measured for a sufficiently long period, at defined regular intervals. These measurements are recorded and logged in control charts. The measurements are made under reproducibility conditions, and are in fact exploitable for the calculation of reproducibility, and for the assessment of measurement uncertainty.

The values of the analytical parameters of the reference materials selected must be within valid measurement ranges.

The reference materials are analyzed during an analytical series, routine if possible, with a variable position in the series from one time to another. In practice, it is perfectly possible to use the measurements of control materials of the series to input the control charts.

6.5.2.2.                        Presentation of results and definition of limits

The individual results are compared with the accepted value of the reference material, and with the reproducibility standard deviation for the parameter in question, at the range level in question.

Two types of limits are defined in the Shewhart charts, the limits associated with individual results, and the limits associated with the mean.

The limits defined for the individual results are usually based on the standard deviation values for intralaboratory reproducibility for the range level in question. They are of two types:

  • alert limit:
  • action limit: .

The limit defined for the cumulated mean narrows as the number of measurements increases.

This limit is an action limit: . n being the number of measurements indicated on the chart.

Note  For reasons of legibility, the alert limit of the cumulated mean is only rarely reproduced on the control chart, and has as its value .

6.5.2.3.                        Using the Shewhart chart

Below we indicate the operating criteria most frequently used. It is up to the laboratories to precisely define the criteria they apply.

Corrective action on the method (or the apparatus) will be undertaken:

a) if an individual result is outside the action limits of the individual results.

b) if two consecutive individual results are located outside the alert limits of individual results.

c) if, in addition, a posteriori analysis of the control charts indicates a drift in the method in three cases:

  • nine consecutive individual result points are located on the same side of the line of the reference values.
  • six successive individual result points ascend or descend.
  • two successive points out of three are located between the alert limit and the action limit.

d) if the arithmetic mean of n recorded results is beyond one of the action limits of the cumulated mean (which highlights a systematic deviation of the results).

Note The control chart must be revised at n = 1 as soon as a corrective action has been carried out on the method.

 

6.5.3.      Internal comparison of analysis systems

In a laboratory that has several analysis methods for a given parameter, it is interesting to carry out measurements of the same test materials in order to compare the results. The agreement of the results between the two methods is considered to be satisfactory if their variation remains lower than 2 times the standard deviation of difference calculated during validation, with a confidence level of 95%.

Note  This interpretation is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

6.5.4.      External comparison of the analysis system

6.5.4.1.                        Analysis chain of interlaboratory comparisons

The organization of the tests and calculations is given in the chapter "comparison with an interlaboratory analysis chain".

In addition to checking the accuracy by the the results can be analyzed in greater detail, in particular with regard to the position of the values of the laboratory in relation to the mean. If they are systematically on the same side of the mean for several successive analysis chains, this can justify the implementation of corrective action by the laboratory, even if remains lower than the critical value.

Note Interpreting the is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

If the intercomparison chain is subject to accreditation, this work of comparison has traceability value.

6.5.4.2.                        Comparison with external reference materials

Measuring external reference materials at regular intervals also can be used to supervise the occurrence of a systematic error (bias).

The principle is to measure the external reference material, and to accept or refuse the value in relation to tolerance limits. These limits are defined in relation to the combination of the uncertainties of the controlled method and the reference value of the reference material.

6.5.4.2.1.                 Standard uncertainty of reference material

The reference values of these materials are accompanied by confidence intervals. The laboratory must determine the nature of this data, and deduce from them the standard uncertainty value for the reference value A distinction must be made between several cases:

  • The case in which uncertainty a is given in the form of an interval confidence at 95% (expanded uncertainty). This means that a normal law has been adopted. a therefore constitutes an "expanded uncertainty" and corresponds to 2 times the standard deviation of the standard uncertainty of the reference values of the materials provided.

The case of a certificate, or another specification, giving limits +/- a without specifying the confidence level. In this case, a rectangular dispersion has been adopted, and the value of measurement X has the same chance of having an unspecified value in the interval ref+/- a.

The particular case of glassware giving limits +/- a. This is the framework of a triangular dispersion.

6.5.4.2.2.                 Defining the validity limits of measuring reference material

To standard uncertainty of the value of the external reference material, is added the standard uncertainty of the laboratory method to be checked,. These two sources of variability must be taken into account in order to determine the limits.

is calculated from the expanded uncertainty of the laboratory method in the following way:

The validity limit of the result (with a confidence level of 95%) =

Example: A pH 7 buffer solution is used to check a pH-meter. The confidence interval given by the pH solution is +/- 0.01. It is indicated that this confidence interval corresponds to the expanded uncertainty with a confidence level of 95%. In addition the expanded uncertainty of the pH-meter is 0.024.

The limits will be

i.e. +/- 0.026 in relation to the reference value, with a confidence level of 95%.

  1. Assessment of measurement uncertainty

 

7.1.  Definition

Parameter, associated with the result of a measurement, which characterizes the dispersion of the values that can reasonably be allotted to the measurand.

In practice, uncertainty is expressed in the form of a standard deviation called standard uncertainty u(x), or in an expanded form (generally with k = 2) U = +/- k.u

7.2.  Reference documents

  • AFNOR ENV 13005 Standard: 1999 – Guide for expressing measurement uncertainty
  • EURACHEM, 2000. Quantifying Uncertainty in Analytical Measurement, EURACHEM second edition 2000
  • ISO 5725 Standard: 1994 – Exactitude (accuracy and precision) of results and measurement methods
  • ISO 21748 standard: 2004 – Guidelines relating to the use of estimations of repeatability, reproducibility and accuracy in evaluating measurement uncertainty
  • Perruchet C and Priel M., Estimating uncertainty, AFNOR Publications, 2000

7.3.  Scope

Uncertainty provides two types of information.

  • On the one hand, that intended for the customers of the laboratory, indicating the potential variations to take into account in order to interpret the result of an analysis. It must be indicated, however, that this information cannot be used as an external means of evaluating the laboratory.
  • In addition, it constitutes a dynamic in-house tool for evaluating the quality of the laboratory analysis results. Insofar as its evaluation is regular and based on a fixed, well-defined methodology, it can be used to see whether the variations involved in a method change positively or negatively (in the case of an estimate based exclusively on intralaboratory data).

The present guide limits itself to providing a practical methodology for oenological laboratories dealing with series analyses. These laboratories have large volumes of data of a significant statistical scale.

Estimating uncertainties can therefore be carried out in most cases using the data collected as part validation and quality control work (in particular with the data in the Shewhart charts). These data can be supplemented by experiment schedules, in particular to determine the systematic errors.

The reference systems describe two main approaches for determining uncertainty: the intralaboratory approach and the approach interlaboratory. Each provides results that are naturally and significantly different. Their significance and their interpretation cannot be identical.

  • the intralaboratory approach provides a result specific to the method in question, in the laboratory in question. The uncertainty that results is an indicator of the performance of the laboratory for the method in question. It answers the customer as follows: "what dispersion of results can I expect from the laboratory practicing the method?”
  • the interlaboratory approach uses results resulting from interlaboratory tests, which provide information about the overall performance of the method.

Laboratories can use the two approaches jointly. It will be interesting to see whether the results obtained using the intralaboratory approach give values lower than the values of the interlaboratory approach.

7.4.  Methodology

The work of uncertainty assessment involves 3 fundamental steps.

  • Definition of the measurand, and description of the quantitative analysis method
  • Critical analysis of the measurement process
  • Uncertainty assessment.

7.4.1.      Definition of the measurand, and description of the quantitative analysis method

First of all, the following must be specified:

  • the purpose of the measurement
  • the quantity measured
  • If the measurand is to be obtained by calculation based on measured quantities, if possible the mathematical relation between them should be stipulated.
  • all the operating conditions.

These items are included in theory in the procedures of the laboratory quality system.

In certain cases the expression of the mathematical relation between the measurand and the quantities can be highly complex (physical methods etc.), and it is neither necessarily relevant nor possible to fully detail them.

7.4.2.      Critical analysis of the measurement process

The sources of error influencing the final result should be identified in order to constitute the uncertainty budget. The importance of each source can be estimated, in order to eliminate those that have only a negligible minor influence. This is done by estimating:

  • the degree of gravity of the drift generated by poor control of the factor in question
  • the frequency of the potential problems
  • their detectability.

This critical analysis can, for example, be carried out using the "5M” method.

Labor;

Operator effect

Matter:

Sample effect (stability, homogeneity, matrix effects), and consumables (reagents, products, solutions, reference materials), etc.

Hardware:

Equipment effect (response, sensitivity, integration modes, etc.), and laboratory equipment (balance, glassware etc.).

Method:

Application effect of the procedure (operating conditions, succession of the operations etc.).

Medium:

Environmental conditions (temperature, pressure, lighting, vibration, radiation, moisture etc.).

7.4.3.      Estimation calculations of standard uncertainty (intralaboratory approach)

7.4.3.1.                        Principle

In the case of laboratories using large series of samples with a limited number of methods, a statistical approach based on intralaboratory reproducibility, supplemented by the calculation of sources of errors not taken into account under intralaboratory reproducibility conditions, appears to be the most suitable approach.

An analysis result deviated from the true value under the effect of two sources of error: systematic errors and random errors.

 

Analysis result = True value + Systematic error + Random error

Uncertainty characterizes the dispersion of the analysis result. This translates into a standard deviation.

Variability (analysis result) = uncertainty

Variability (true value) = 0

Variability (systematic error) =

Variability (random error) = SR (intralaboratory reproducibility standard deviation)

Since standard deviations are squared when added, the estimated standard uncertainty u(x) takes the following form:

Non-integrable sources of errors under the intralaboratory reproducibility conditions, i.e. systematic errors, must be determined in the form of standard deviation to be combined together and with the reproducibility standard deviation.

 

The laboratory can take action so that the reproducibility conditions applied make it possible to include a maximum number of sources of errors. This is obtained in particular by constituting stable test materials over a sufficiently long period, during which the laboratory takes care to vary all the possible experimental factors. In this way, SR will cover the greatest number of possible sources of errors (random), and the work involved in estimating the systematic errors, which is often more complex to realize, will be minimized.

It should be noted here that the EURACHEM/CITAC guide entitled "Quantifying uncertainty in analytical measurements" recalls that "In general, the ISO Guide requires that corrections be applied for all systematic effects that are identified and significant". In a method "under control", systematic errors should therefore constitute a minor part of uncertainty.

The following non-exhaustive table gives examples of typical sources of error and proposes an estimation approach for each of them, using integration under reproducibility conditions as much as possible.

 

Source of error

Type of error

Commentary

Estimation method

Sampling

(constitution of the sample)

Random

Sampling is one of the "businesses" defined in the ISO 17025 standard. Laboratories stating they do not perform sampling, do not include this source of error in the uncertainty assessment.

Can be including in intralaboratory reproducibility by including sampling in handling.

Sub-sampling

(sampling a quantity of sample in order to carry out the test)

Random

Is significant if the sample is not homogeneous. This source of error remains minor for wine.

Included in the intralaboratory reproducibility conditions if the test material used is similar to routine test materials.

Stability of the sample

Random

Depends on the storage conditions of the sample. In the case of wines, laboratories should pay detailed attention to the losses of sulfur dioxide and ethanol.

Possible changes in the sample can be integrated into the reproducibility conditions. This source of uncertainty can then be evaluated overall.

Gauging of the apparatus

Systematic/Random

This error is systematic if gauging is established for a long period, and becomes random if gauging is regularly carried out over a time-scale integrated under reproducibility conditions

Source of error to be taken into account in absolute methods.

Error of gauging line § 7.4.2.4.1

Taken into account under the reproducibility conditions if gauging is regularly revised.

Effect of contamination or memory

Random

This effect will be minimized by the proper design of measuring instruments and suitable rinsing operations

The reproducibility conditions take this effect into account, as long as the reference materials are inserted at various positions in the analysis series.

Precision of automata

Random

This applies to intraseries drift in particular. This can be controlled in particular by positioning the control materials within the framework of the IQC

The reproducibility conditions take this effect into account, as long as the reference materials are inserted at various positions in the analysis series.

Purity of the reagents

Random

The purity of the reagents has very little effect on the relative methods, insofar as the gauging and analyses are carried out with the same batches of reagents.

This effect is to be taken into account in absolute methods.

To be integrated under reproducibility conditions using various batches of reagents.

Purity of the reagents

Random

The purity of the reagents has very little effect on the relative methods, insofar as the gauging and analyses are carried out with the same batches of reagents.

This effect is to be taken into account in absolute methods.

To be integrated under reproducibility conditions using various batches of reagents.

Measurement conditions

Random

Effects of temperature, moisture etc.

Typically taken into account under reproducibility conditions

Matrix effect

Random from one sample to another, systematic on the same sample

These effects are to be taken into account in methods whose measured signal is not perfectly specific.

If this effect is regarded as significant, a specific experiment schedule can be used to estimate uncertainty due to this effect § 7.4.2.4.3

This effect is not integrated under reproducibility conditions.

Gauging effect

Systematic if gauging is constant

Random if gauging is regularly renewed

Taken into account under the reproducibility conditions if gauging is regularly renewed. If the gauging used remains the same one (on the scale of the periods in question within the framework of the reproducibility conditions), it is advisable to implement an experiment schedule in order to estimate the error of the gauging line § 7.4.2.4.1

Operator effect

Random

To be taken into account in the reproducibility conditions by taking care to utilize all the authorized operators.

Bias

Systematic

Must be minimized by the quality control work of the laboratory.

Systematic effect, can be estimated using certified references.

7.4.3.2.                        Calculating the standard deviation of intralaboratory reproducibility

The reproducibility standard deviation SR is calculated using the protocol described in the section entitled "Intralaboratory reproducibility" (cf. § 5.4.3.5).

The calculation can be based on several test materials. In the noteworthy case where SR is proportional to the size of the measurand, the data collected on several test materials with different values should not be combined: SR should be expressed in relative value (%).

7.4.3.3.                        Estimating typical sources of systematic errors not taken into account under reproducibility conditions

7.4.3.3.1.                 Gauging error (or calibration error)

Whenever the gauging of an instrument (or the calibration of an absolute method) is not regularly redone, its output cannot be integrated in the reproducibility values. An experiment schedule must be carried out in order to estimate it using the residual error of the regression.

7.4.3.3.1.1.           Procedure

The approach is similar to that carried out in the linearity study of the method.

It is recommended to implement a number n of reference materials. The number must be higher than 3, but it is not necessary to go beyond 10. The reference materials are to be measured p times under intralaboratory precision conditions, p must be higher than 3, a figure of 5 is generally recommended. The accepted values of reference materials must be regularly distributed on the range of values under study. The number of measurements must be the same for all the reference materials.

The results are reported in a table presented as follows:

Reference materials

Accepted value of the reference material

Measured values

Replica 1

 …

Replica j

 …

Replica p

1

x1

y11

 

y1j

y1p

 

 

 

 

 

 

 

i

xi

yi1

 

yij

yip

 

 

….

 

 

 

 

n

xn

yn1

 

ynj

ynp

7.4.3.3.1.2.           Calculations and results

The linear regression model is calculated.

Where

is replica of the reference material.

is the accepted value of the  reference material.

b is the slope of the regression line.

A is the intercept point of the regression line.

represent the expectation of the measurement value of thereference material.

is the difference between and the expectation of the measurement value of thereference material.

The parameters of the regression line are obtained using the following formulae:

  • mean of p measurements of thereference material

  • mean of all the accepted values of n reference materials

  • mean of all measurements

  • estimated slope b

  • estimated intercept point a

  • regression value associated with the reference material

 

  • residual

Estimating the standard uncertainty associated the gauging line (or calibration line)

If the errors due to the regression line are constant over the entire field, the standard uncertainty is estimated in a global, single way by the overall residual standard deviation.

If the errors due to the regression line are not constant over the entire field, the standard uncertainty is estimated for a given level by the residual standard deviation for this level.

Note These estimates of standard deviations can be used if the linear regression model and the gauging (or calibration) domain have been validated (see § 5.3.1)

7.4.3.3.2.                 Bias error

According to the EURACHEM guide, "Quantifying uncertainty in analytical measurements", it is recalled that the ISO guide generally requires that corrections be applied for all identified significant systematic effects. The same applies to the bias of methods for which the laboratory implements its quality control system (see §6), and which tends towards 0 for methods "under control".

In practice, a distinction can be made between two cases:

7.4.3.3.2.1.           Methods adjusted with only one certified reference material

Bias is permanently adjusted with the same reference material.

The certified reference material (CRM) ensures the metrological traceability of the method. A reference value was allotted to the CRM together with its standard uncertainty uref. This standard uncertainty of the CRM is combined with the compound uncertainty for the method, , to determine the overall standard uncertainty of the laboratory method u(x).

The overall standard uncertainty of the method adjusted with the CRM in question is therefore:

Note 1 The methodology is identical in the case of methods adjusted with the results of an interlaboratory comparison chain.

Note 2 Note the difference between a CRM used to adjust the bias of a method, in which the uncertainty of its reference value combines with that of the method, and a CRM used to control a method adjusted by other means (cf. § 6.5.4.2). In the second case, the uncertainty of the CRM should not be used for the uncertainty assessment of the method.

7.4.3.3.2.2.           Methods adjusted with several reference materials (gauging ranges etc.)

There is no particular adjustment of bias apart from gauging work.

It is clear that each calibrator introduces bias uncertainty. There is therefore an overall theoretical uncertainty of bias, which is a combination of the uncertainties of each calibrator. This uncertainty is very delicate to estimate, but it generally proves to be sufficiently low to be ignored, in particular if the laboratory monitors the quality of its calibrators, and the uncertainty of their reference values.

Other than in specific cases, bias uncertainty is ignored here.

7.4.3.3.3.                 Matrix effect

7.4.3.3.3.1.           Definition

The matrix effect incurs a repeatable source of error for a given sample, but random from one sample to another. This error is related to the interaction of the compounds present in the product to be analyzed on measuring the required analyte. The matrix effect appears in methods with a nonspecific signal.

The matrix effect often constitutes a small part of uncertainty, particularly in separative methods. In certain other methods, including the infra-red techniques, it is a significant component of uncertainty.

Example: Estimate of the matrix effect on FTIR

The signal for the FTIR, or infra-red spectrum, is not a signal specific to each of the compounds that are measured by this technique. The statistical gauging model can be used to process disturbed, nonspecific spectral data in a sufficiently exact estimate of the value of the measurand. This model integrates the influences of the other compounds of the wine, which vary from one wine to the next and introduce an error into the result. Upstream of the routine analysis work, special work is carried out by the gauging developers to minimize this matrix effect and to make gauging robust, i.e. capable of integrating these variations without reflecting them in the result. Nevertheless the matrix effect is always present and constitutes a source of error at the origin of a significant part of the uncertainty of an FTIR method.

To be completely rigorous, this matrix effect error can be estimated by comparing, on the one hand, the means for a great number of FTIR measurement replicas, obtained on several reference materials (at least 10), under reproducibility conditions, and the true values of reference materials with a natural wine matrix on the other. The standard deviation of the differences gives this variability of gauging (provided that the gauging has been adjusted beforehand (bias = 0)).

This theoretical approach cannot be applied in practice, because the true values are never known, but it is experimentally possible to come sufficiently close to it:

  • As a preliminary, the FTIR gauging must be statistically adjusted (bias = 0) in relation to a reference method based on at least 30 samples. This can be used to eliminate the effects of bias in the measurements thereafter.
  • The reference materials must be natural wines. It is advisable to use at least 10 different reference materials, with values located inside a range level, the uncertainty of which can be considered to be constant.
  • An acceptable reference value is acquired, based on the mean of several measurements by the reference method, carried out under reproducibility conditions. This can be used to lower the uncertainty of the reference value: if, for the reference method used, all the significant sources of uncertainty range within reproducibility conditions, the multiplication of the number p of measurements carried out under reproducibility conditions, enable the uncertainty associated with their mean to be divided by. The mean obtained using a sufficient number of measurements will then have a low level of uncertainty, even negligible in relation to the uncertainty of the alternative method; and can therefore be used as a reference value. p must at least be equal to 5.
  • The reference materials are analyzed by the FTIR method, with several replicas, acquired under reproducibility conditions. By multiplying the number of measurements q under reproducibility conditions on the FTIR method, the variability related to the precision of the method (random error) can be decreased. The mean value of these measurements will have a standard deviation of variability divided by. This random error can then become negligible in relation to the variability linked to the gauging (matrix effect) that we are trying to estimate. q must at least be equal to 5.

The following example is applied to the determination of acetic acid by FTIR gauging. The reference values are given by 5 measurements under reproducibility conditions on 7 stable test materials. The number of 7 materials is in theory insufficient, but the data here are only given by way of an example.

 

 

 

Reference method

FTIR

 

Materials

1

2

3

4

5

Mean

Ref

1

2

3

4

5

Mean

FTIR

Diff

1

0.30

0.32

0.31

0.30

0.31

0.308

0.30

0.31

0.31

0.30

0.30

0.305

-0.004

2

0.31

0.32

0.32

0.32

0.31

0.316

0.31

0.32

0.30

0.31

0.31

0.315

-0.006

3

0.38

0.39

0.39

0.38

0.38

0.384

0.37

0.37

0.37

0.37

0.36

0.37

-0.016

4

0.25

0.25

0.25

0.24

0.25

0.248

0.26

0.26

0.26

0.25

0.26

0.26

0.01

5

0.39

0.39

0.40

0.40

0.39

0.394

0.43

0.42

0.43

0.42

0.42

0.425

0.03

6

0.27

0.26

0.26

0.26

0.26

0.262

0.25

0.26

0.25

0.25

0.26

0.255

-0.008

7

0.37

0.37

0.37

0.37

0.36

0.368

0.37

0.36

0.36

0.35

0.36

0.365

-0.008

Calculation of the differences: diff = Mean FTIR – Mean ref.

The mean of the differences Md = 0.000 verifies (good adjustment of the FTIR compared with the reference method)

 

The standard deviation of the differences, = 0.015. It is this standard deviation that is used to estimate the variability generated by the gauging, and we can therefore state that:

 

= 0.015

 

NOTE  It should be noted that the value of Uf can be over-estimated by this approach. If the laboratory considers that the value is significantly excessive under the operating conditions defined here, it can increase the number of measurements on the reference method and/or the FTIR method.

The reproducibility conditions include all the other significant sources of error, SR was otherwise calculated: SR = 0.017

 

The uncertainty of the determination of acetic acid by this FTIR application is:

 

7.4.3.3.4.                 Sample effect

In certain cases, the experiment schedules used to estimate uncertainty are based on synthetic test materials. In such a situation, the estimate does not cover the sample effect (homogeneity). The laboratories must therefore estimate this effect.

It should be noted, however, that this effect is often negligible in oenological laboratories, which use homogeneous samples of small quantities.

7.4.4.      Estimating standard uncertainty by interlaboratory tests

7.4.4.1.                        Principle

The interlaboratory approach uses data output by interlaboratory tests from which a standard deviation of interlaboratory reproducibility is calculated, in accordance with the principles indicated in §5.4.3. The statisticians responsible for calculating the results of the interlaboratory tests can identify "aberrant" laboratory results, by using tests described in the ISO 5725 standard (Cochran test). These results can then be eliminated after agreement between the statisticians and the analysts.

For the uncertainty assessment by interlaboratory approach, the guidelines stated in the ISO 21748 standard are as follows:

  • The reproducibility standard deviation (interlaboratory) obtained in a collaborative study is a valid basis for evaluating the uncertainty of measurement
  • Effects that are not observed as part of the collaborative study must be obviously negligible or be explicitly taken into account.

There are two types of interlaboratory tests:

  • Collaborative studies which relate to only one method. These studies are carried out for the initial validation of a new method in order to define the standard deviation of interlaboratory reproducibility SRinter (method).
  • Interlaboratory comparison chains, or aptitude tests. These tests are carried out to validate a method adopted by the laboratory, and the routine quality control (see § 5.3.3.3). The data are processed as a whole, and integrate all the analysis methods employed by the laboratories participating in the tests. The results are the interlaboratory mean m, and the standard deviation of interlaboratory and intermethod reproducibility SRinter.

7.4.4.2.                        Using the standard deviation of interlaboratory and intramethod reproducibility SRinter (method)

The standard deviation of intralaboratory reproducibility SRinter (method) takes into account intralaboratory variability and the overall interlaboratory variability related to the method.

Then must be taken into account the fact that the analysis method can produce a systematic bias compared with the true value.

As part of a collaborative study, whenever possible, the error produced by this bias can be estimated by using certified reference materials, under the same conditions as described in § 7.4.3.3.2, and added to SRinter (method).

7.4.4.3.                        Using the standard deviation of interlaboratory and intermethod reproducibility SRinter

The standard deviation of intralaboratory reproducibility SRinter takes into account intralaboratory variability and interlaboratory variability for the parameter under study.

The laboratory must check its accuracy in relation to these results (see § 5.3.3).

There is no need to add components associated with method accuracy to the uncertainty budget, since in the "multi-method" aptitude tests, errors of accuracy can be considered to be taken into account in SRinter.

7.4.4.4.                        Other components in the uncertainty budget

Insofar as the test materials used for the interlaboratory tests are representative of the conventional samples analyzed by laboratories, and that they follow the overall analytical procedure (sub-sampling, extraction, concentration, dilution, distillation etc.), SR-inter represents the standard uncertainty u(x) of the method, in the interlaboratory sense.

Errors not taken into account in the interlaboratory tests must then be studied in order to assess their compound standard uncertainty, which will be combined with the compound standard uncertainty of the interlaboratory tests.

7.5.  Expressing expanded uncertainty

In practice, uncertainty is expressed in its expanded form, is absolute terms for methods in which uncertainty is stable in the scope in question, or relative when uncertainty varies proportionally in relation to the quantity of the measurand:

Absolute uncertainty: 

Relative uncertainty (in %):

where mean represents the reproducibility results.

Note This expression of uncertainty is possible given the assumption that the variations obey a normal law with a 95% confidence rate.

These expressions result in a given uncertainty value with a confidence level of 95%.

 

REFERENCES

  • OIV, 2001 – Recueil des methods internationales d’analyse des vins and des moûts; OIV Ed., Paris.
  • (2) OIV, 2002 – Recommandations harmonisées pour le contrôle interne de qualité dans les laboratoires d’analyse; OIV resolution œno 19/2002., Paris.
  • (3) Standard ISO 5725: 1994 – Exactitude (justesse and fidélité) des results and methods de mesure, classification index X 06-041-1
  • (4) IUPAC, 2002 – Harmonized guidelines for single-laboratory validation of analysis methods; Pure Appl. Chem., Vol. 74; n°5, pp. 835-855.
  • (5) Standard ISO 11095: 1996 – Etalonnage linéaire utilisant des materials de référence, reference number ISO 11095:1996
  • (6) Standard ISO 21748: 2004 – Lignes directrices relatives à l’utilisation d’estimation de la répétabilité, de la reproductibilité and de la justesse dans l’évaluation de l’incertitude de mesure, reference number ISO ISO/TS 21748:2004
  • (7) Standard AFNOR V03-110: 1998 – Procédure de validation intralaboratory d’une method alternative par rapport à une method de référence, classification index V03-110
  • (8) Standard AFNOR V03-115: 1996 – Guide pour l’utilisation des materials de référence, classification index V03-115
  • (9) Standard AFNOR X 07-001: 1994 – Vocabulaire international des termes fondamentaux and généraux de métrologie, classification index X07-001
  • (10) Standard AFNOR ENV 13005: 1999 – Guide pour l’expression de l’incertitude de mesure
  • (11) AFNOR, 2003, - Métrologie dans l’entreprise, outil de la qualité 2ème édition, AFNOR 2003 édition
  • (12) EURACHEM, 2000. - Quantifying Uncertainty in Analytical Measurement, EURACHEM second edition 2000
  • (13) CITAC / EURACHEM, 2000 - Guide pour la qualité en chimie analytique, EURACHEM 2002 edition
  • (14) Bouvier J.C., 2002 - Calcul de l’incertitude de mesure – Guide pratique pour les laboratoires d’analyse œnologique, Revue Française d’œnologie no.197, Nov-Dec 2002, pp: 16-21
  • (15) Snakkers G. and Cantagrel R., 2004 - Utilisation des données des circuits de comparaison interlaboratoires pour apprécier l’exactitude des results d’un laboratoire Estimation d’une incertitude de mesure - Bull OIV, Vol. 77 857-876, Jan – Feb 2004, pp: 48-83
  • (16) Perruchet C. and Priel M, 2000 - Estimer l’incertitude, AFNOR Editions
  • (17) Neuilly (M.) and CETAMA, 1993 - Modélisation and estimation des errors de mesures, Lavoisier Ed, Paris

Annex N°1

Table A -Law of SNEDECOR

This table indicates values of  F in function with 1 and 2 for a risk of 0,05

P=0,950

1

2

1

2

3

4

5

6

7

8

9

10

1

2

1

161,4

199,5

215,7

224,6

230,2

234,0

236,8

238,9

240,5

241,9

1

2

18,51

19,00

19,16

19,25

19,30

19,33

19,35

19,37

19,38

19,40

2

3

10,13

9,55

9,28

9,12

9,01

8,94

8,89

8,85

8,81

8,79

3

4

7,71

6,94

6,59

6,39

6,26

6,16

6,09

6,04

6,00

5,96

4

5

6,61

5,79

5,41

5,19

5,05

4,95

4,88

4,82

4,77

4,74

5

6

5,99

5,14

4,76

4,53

4,39

4,28

4,21

4,15

4,10

4,06

6

7

5,59

4,74

4,35

4,12

3,97

3,87

3,79

3,73

3,68

3,64

7

8

5,32

4,46

4,07

3,84

3,69

3,58

3,50

3,44

3,39

3,35

8

9

5,12

4,26

3,86

3,63

3,48

3,37

3,29

3,23

3,18

3,14

9

10

4,96

4,10

3,71

3,48

3,33

3,22

3,14

3,07

3,02

2,98

10

11

4,84

3,98

3,59

3,36

3,20

3,09

3,01

2,95

2,90

2,85

11

12

4,75

3,89

3,49

3,26

3,11

3,00

2,91

2,85

2,80

2,75

12

13

4,67

3,81

3,41

3,18

3,03

2,92

2,83

2,77

2,71

2,67

13

14

4,60

3,74

3,34

3,11

2,96

2,85

2,76

2,70

2,65

2,60

14

15

4,54

3,68

3,29

3,06

2,90

2,79

2,71

2,64

2,59

2,54

15

16

4,49

3,63

3,24

3,01

2,85

2,74

2,66

2,59

2,54

2,49

16

17

4,45

3,59

3,20

2,96

2,81

2,70

2,61

2,55

2,49

2,45

17

18

4,41

3,55

3,16

2,93

2,77

2,66

2,58

2,51

2,46

2,41

18

19

4,38

3,52

3,13

2,90

2,74

2,63

2,54

2,48

2,42

2,38

19

20

4,35

3,49

3,10

2,87

2,71

2,60

2,51

2,45

2,39

2,35

20

21

4,32

3,47

3,07

2,84

2,68

2,57

2,49

2,42

2,37

2,32

21

22

4,30

3,44

3,05

2,82

2,66

2,55

2,46

2,40

2,34

2,30

22

23

4,28

3,42

3,03

2,80

2,64

2,53

2,44

2,37

2,32

2,27

23

24

4,26

3,40

3,01

2,78

2,62

2,51

2,42

2,36

2,30

2,25

24

25

4,24

3,39

2,99

2,76

2,60

2,49

2,40

2,34

2,28

2,24

25

26

4,23

3,37

2,98

2,74

2,59

2,47

2,39

2,32

2,27

2,22

26

27

4,21

3,35

2,96

2,73

2,57

2,46

2,37

2,31

2,25

2,20

27

28

4,20

3,34

2,95

2,71

2,56

2,45

2,36

2,29

2,24

2,19

28

29

4,18

3,33

2,93

2,70

2,55

2,43

2,35

2,28

2,22

2,18

29

30

4,17

3,32

2,92

2,69

2,53

2,42

2,33

2,27

2,21

2,16

30

40

4,08

3,23

2,84

2,61

2,45

2,34

2,25

2,18

2,12

2,08

40

60

4,00

3,15

2,76

2,53

2,37

2,25

2,17

2,10

2,04

1,99

60

120

3,92

3,07

2,68

2,45

2,29

2,17

2,09

2,02

1,96

1,91

120

3,84

3,00

2,60

2,37

2,21

2,10

2,01

1,94

1,88

1,83

2

1

1

2

3

4

5

6

7

8

9

10

2

1

Harmonised guidelines for single-laboratory validation

OIV-MA-AS1-13 Harmonised guidelines for single laboratory validation of methods of analysis (technical report)

Synopsis

Method validation is one of the measures universally recognised as a necessary part of a comprehensive system of quality assurance in analytical chemistry.  In the past ISO, IUPAC and AOAC INTERNATIONAL have co-operated to produce agreed protocols or guidelines on the “Design, Conduct and Interpretation of Method Performance Studies”1 on the “Proficiency Testing of (Chemical) Analytical Laboratories”2 on “Internal Quality Control in Analytical Chemistry Laboratories”3 and on “The Use of Recovery Information in Analytical Measurement”.4 ( from the usage of overlapping data in analytical measurements) The Working Group that produced these protocols/guidelines has now been mandated by IUPAC to prepare guidelines on the Single-laboratory Validation of methods of analysis.  These guidelines provide minimum recommendations on procedures that should be employed to ensure adequate validation of analytical methods.

A draft of the guidelines has been discussed at an International Symposium on the Harmonisation of Quality Assurance Systems in Chemical Laboratory, the Proceedings from which have been published by the UK Royal Society of Chemistry.

Resulting from the Symposium on Harmonisation of Quality Assurance

Systems for Analytical Laboratories, Budapest, Hungary, 4-5 November 1999

held under the sponsorship of IUPAC, ISO and AOAC INTERNATIONAL

Contents

1. Introduction

1.1. Background

1.2. Existing protocols, standards and guides

2. Definitions and terminology

2.1.General

2.2. Definitions used in this guide

3. Method validation, uncertainty, and quality assurance

4. Basic principles of method validation

4.1. Specification and scope of validation

4.2. Testing assumptions

4.3. Sources of Error in Analysis

4.4. Method and Laboratory effects

5. Conduct of Validation Studies

6. Extent of validation studies

6.1. The laboratory is to use a “fully” validated method

6.2. The laboratory is to use a fully validated method, but new matrix is to be used

6.3. The laboratory is to use a well-established, but not collaboratively studied, method

6.4. The method has been published in the scientific literature together with some analytical characteristics

6.5. The method has been published in the scientific literature with no characteristics given or has been developed in-house

6.6. The method is empirical

6.7. The analysis is “ad hoc”

6.8. Changes in staff and equipment

7. Recommendations

8. Bibliography

Annex A: Notes on the requirements for study of method performance characteristics

.

A1 Applicability

A2.Selectivity

A3. Calibration and linearity

A3.1. Linearity and intercept

A3.2. Test for general matrix effect

A3.3. Final calibration procedure

A4. Trueness

A4.1. Estimation of trueness

A4.2. Conditions for trueness experiments

A4.3. Reference values for trueness experiments

A4.3.1. Certified reference materials (CRMs)

A4.3.2. Reference materials

A4.3.3. Use of a reference method

A4.3.4. Use of spiking/recovery

A5. Accuracy

A6. Recovery

A7. Concentration range

A8. Detection Limit

A9. Limit of determination or limit of quantification

A10. Sensitivity

A11. Ruggedness

A12.Fitness for trial purposes

A13. Matrix variation

A14.Measurement Uncertainty

Annex B Additional considerations for uncertainty estimation in validation studies

B1.Sensitivity analysis

B2. Judgement

  1. Introduction

1.1.  Background

Reliable analytical methods are required for compliance with national and international regulations in all areas of analysis. It is accordingly internationally recognised that a laboratory must take appropriate measures to ensure that it is capable of providing and does provide data of the required quality.  Such measures include:

  • using validated methods of analysis;
  • using internal quality control procedures;
  • participating in proficiency testing schemes; and
  • becoming accredited to an International Standard, normally ISO/IEC  17025.

It should be noted that accreditation to ISO/IEC 17025 specifically addresses the establishment of traceability for measurements, as well as requiring a range of other technical and management requirements including all those in the list above.

Method validation is therefore an essential component of the measures that a laboratory should implement to allow it to produce reliable analytical data. Other aspects of the above have been addressed previously by the IUPAC Interdivisional Working Party on Harmonisation of Quality Assurance Schemes for Analytical Laboratories, specifically by preparing Protocols/Guidelines on method performance (collaborative) studies,1 proficiency testing,2 and internal quality control.3

In some sectors, most notably in the analysis of food, the requirement for methods that have been “fully validated” is prescribed by legislation.5,6  “Full” validation for an analytical method is usually taken to comprise an examination of the characteristics of the method in an inter-laboratory method performance study (also known as a collaborative study or collaborative trial).  Internationally accepted protocols have been established for the “full” validation of a method of analysis by a collaborative trial, most notably the International Harmonised Protocol1 and the ISO procedure.7  These protocols/standards require a minimum number of laboratories and test materials to be included in the collaborative trial to validate fully the analytical method. However, it is not always practical or necessary to provide full validation of analytical methods. In such circumstances a “single-laboratory method validation” may be appropriate.

Single-laboratory method validation is appropriate in several circumstances including the following:

to ensure the viability of the method before the costly exercise of a formal collaborative trial;

to provide evidence of the reliability of analytical methods if collaborative trial data are not available or where the conduct of a formal collaborative trial is not practicable;

to ensure that “off-the-shelf” validated methods are being used correctly.

When a method is to be characterised in-house, it is important that the laboratory determines and agrees with its customer exactly which characteristics are to be evaluated.  However, in a number of situations these characteristics may be laid down by legislation (e.g. veterinary drug residues in food and pesticides in food sectors). The extent of the evaluation that a laboratory undertakes must meet the requirements of legislation.

Nevertheless in some analytical areas the same analytical method is used by a large number of laboratories to determine stable chemical compounds in defined matrices. It should be appreciated that if a suitable collaboratively studied method can be made available to these laboratories, then the costs of the collaborative trial to validate that method may well be justified. The use of a collaboratively studied method considerably reduces the efforts which a laboratory, before taking a method into routine use, must invest in extensive validation work. A laboratory using a collaboratively studied method, which has been found to be fit for the intended purpose, needs only to demonstrate that it can achieve the performance characteristics stated in the method. Such a verification of the correct use of a method is much less costly than a full single laboratory validation. The total cost to the Analytical Community of validating a specific method through a collaborative trial and then verifying its performance attributes in the laboratories wishing to use it is frequently less than when many laboratories all independently undertake single laboratory validation of the same method.

1.2.  Existing Protocols, Standards and Guides

A number of protocols and guidelines8-19 on method validation and uncertainty have been prepared, most notably in AOAC INTERNATIONAL, International Conference on Harmonisation (ICH) and Eurachem documents:

The Statistics manual of the AOAC, which includes guidance on single laboratory study prior to collaborative testing13

The ICH text15 and methodology,16 which prescribe minimum validation study requirements for tests used to support drug approval submission.

The Fitness for Purpose of Analytical Methods: A Laboratory Guide to Method Validation and Related Topics (1998)12

Quantifying Uncertainty in Analytical Measurement (2000)9

Method validation was also extensively discussed at a Joint FAO/IAEA Expert Consultation, December 1997, on the Validation of Analytical Methods for Food Controls, the Report of which is available19.

The present ‘Guidelines’ bring together the essential scientific principles of the above documents to provide information which has been subjected to international acceptance and, more importantly, to point the way forward for best practice in single-laboratory method validation.

  1. Definitions and terminology

2.1.  General

Terms used in this document respect ISO and IUPAC definitions where available. The following documents contain relevant definitions:

i) IUPAC: Compendium of chemical terminology, 1987

ii) International vocabulary of basic and general terms in metrology. ISO 1993

2.2.  Definitions used in this guide only:

Relative uncertainty: Uncertainty expressed as a relative standard deviation.

Validated range: That part of the concentration range of an analytical method which has been subjected to validation.

  1. Method validation, uncertainty, and quality assurance

Method validation makes use of a set of tests which both test any assumptions on which the analytical method is based and establish and document the performance characteristics of a method, thereby demonstrating whether the method is fit for a particular analytical purpose. Typical performance characteristics of analytical methods are: applicability; selectivity; calibration; trueness; precision; recovery; operating range; limit of quantification; limit of detection; sensitivity; and ruggedness. To these can be added measurement uncertainty and fitness-for-purpose.

Strictly speaking, validation should refer to an ‘analytical system’ rather than an ‘analytical method’, the analytical system comprising a defined method protocol, a defined concentration range for the analyte, and a specified type of test material.  For the purposes of this document, a reference to ‘method validation’ will be taken as referring to an analytical system as a whole. Where the analytical procedure as such is addressed, it will be referred to as ‘the protocol’.

In this document method validation is regarded as distinct from ongoing activities such as internal quality control (IQC) or proficiency testing.  Method validation is carried out once, or at relatively infrequent intervals during the working lifetime of a method; it tells us what performance we can expect the method to provide in the future.  Internal quality control tells us about how the method has performed in the past. IQC is therefore treated as a separate activity in the IUPAC Harmonisation Programme.3

In method validation the quantitative characteristics of interest relate to the accuracy of the result likely to be obtained. Therefore it is generally true to say that method validation is tantamount to the task of estimating uncertainty of measurement. Over the years it has become traditional for validation purposes to represent different aspects of method performance by reference to the separate items listed above, and to a considerable extent these guidelines reflect that pattern.  However, with an increasing reliance on measurement uncertainty as a key indicator of both fitness for purpose and reliability of results, analytical chemists will increasingly undertake measurement validation to support uncertainty estimation, and some practitioners will want to do so immediately. Accordingly, measurement uncertainty is treated briefly in Annex A as a performance characteristic of an analytical method, while Annex B provides additional guidance on some procedures not otherwise covered.

  1. Bacis principles of method validation

4.1.  Specification and scope of validation

Validation applies to a defined protocol, for the determination of a specified analyte and range of concentrations in a particular type of test material, used for a specified purpose. In general, validation should check that the method performs adequately for the purpose throughout the range of analyte concentrations and test materials to which it is applied.  It follows that these features, together with a statement of any fitness-for-purpose criteria, should be completely specified before any validation takes place.

4.2.  Testing assumptions

In addition to the provision of performance figures which indicate fitness for purpose and have come to dominate the practical use of validation data, validation studies act as an objective test of any assumptions on which an analytical method is based. For example, if a result is to be calculated from a simple straight line calibration function, it is implicitly assumed that the analysis is free from significant bias, that the response is proportional to analyte concentration, and that the dispersion of random errors is constant throughout the range of interest. In most circumstances, such assumptions are made on the basis of experience accumulated during method development or over the longer term, and are consequently reasonably reliable. Nonetheless, good measurement science relies on tested hypotheses. This is the reason that so many validation studies are based on statistical hypothesis testing; the aim is to provide a basic check that the reasonable assumptions made about the principles of the method are not seriously flawed.

There is an important practical implication of this apparently abstruse note. It is easier to check for gross departure from a reliable assumption than to ‘prove’ that a particular assumption is correct.  Thus, where there is long practice of the successful use of a particular analytical technique (such as gas chromatographic analysis, or acid digestion methods) across a range of analytes and matrices, validation checks justifiably take the form of relatively light precautionary tests. Conversely, where experience is slight, the validation study needs to provide strong evidence that the assumptions made are appropriate in the particular cases under study, and it will generally be necessary to study the full range of circumstances in detail. It follows that the extent of validation studies required in a given instance will depend, in part, on the accumulated experience of the analytical technique used.

In the following discussion, it will be taken for granted that the laboratory is well practised in the technique of interest, and that the purpose of any significance tests is to check that there is no strong evidence to discount the assumptions on which the particular protocol relies. The reader should bear in mind that more stringent checks may be necessary for unfamiliar or less established measurement techniques.

4.3.  Sources of Error in Analysis

Errors in analytical measurements arise from different sources[*] and at different levels of organisation.  One useful way of representing these sources (for a specific concentration of analyte) is as follows[+]24:

  • random error of measurement (repeatability);
  • run bias ;
  • laboratory bias;
  • method bias;
  • matrix variation effect.

Though these different sources may not necessarily be independent, this list provides a useful way of checking the extent to which a given validation study addresses the sources of error.

The repeatability (within-run) term includes contributions from any part of the procedure that varies within a run, including contributions from the familiar gravimetric and volumetric errors, heterogeneity of the test material, and variation in the chemical treatment stages of the analysis, and is easily seen in the dispersion of replicated analyses. The run effect accounts for additional day-to-day variations in the analytical system, such as changes of analyst, batches of reagents, recalibration of instruments, and the laboratory environment (e.g., temperature changes). In single-laboratory validation, the run effect is typically estimated by conducting a designed experiment with replicated analysis of an appropriate material in a number of separate runs. Between-laboratory variation arises from factors such as variations in calibration standards, differences between local interpretations of a protocol, changes in equipment or reagent source or environmental factors, such as differences in average climatic conditions. Between-laboratory variation is clearly seen as a reality in the results of collaborative trials (method performance studies) and proficiency tests, and between-method variation can sometimes be discerned in the results of the latter.

Generally, the repeatability, run effect and laboratory effect are of comparable magnitude, so none can safely be ignored in validation. In the past there has been a tendency for aspects to be neglected, particularly when estimating and reporting uncertainty information. This results in uncertainty intervals that are too tight. For example, the collaborative trial as normally conducted does not give the complete picture because contributions to uncertainty from method bias and matrix variation are not estimated in collaborative trials and have to be addressed separately (usually by prior single-laboratory study). In single-laboratory validation there is the particular danger that laboratory bias also may be overlooked, and that item is usually the largest single contributor to uncertainty from the above list. Therefore specific attention must be paid to laboratory bias in single-laboratory validation.

In addition to the above-mentioned problems, the validation of a method is limited to the scope of its application, that is, the method as applied to a particular class of test material. If there is a substantial variation of matrix types within the defined class, there will be an additional source of variation due to within-class matrix effects. Of course, if the method is subsequently used for materials outside the defined class (that is, outside the scope of the validation), the analytical system cannot be considered validated: an extra error of unknown magnitude is introduced into the measurement process.

It is also important for analysts to take account of the way in which method performance varies as a function of the concentration of the analyte. In most instances the dispersion of results increases absolutely with concentration and recovery may differ substantially at high and low concentrations.  The measurement uncertainty associated with the results is therefore often dependent on both these effects and on other concentration-dependent factors. Fortunately, it is often reasonable to assume a simple relationship between performance and analyte concentration; most commonly that errors are proportional to analyte concentration.[*] However, where the performance of the method is of interest at substantially different concentrations, it is important to check the assumed relationship between performance and analyte concentration. This is typically done by checking performance at extremes of the likely range, or at a few selected levels. Linearity checks also provide information of the same kind.

4.4.  Method and Laboratory effects

It is critically important in single-laboratory method validation to take account of method bias and laboratory bias. There are a few laboratories with special facilities where these biases can be regarded as negligible, but that circumstance is wholly exceptional. (However, that if there is only one laboratory carrying out a particular analysis, then method bias and laboratory bias take on a different perspective). Normally, method and laboratory effects have to be included in the uncertainty budget, but often they are more difficult to address than repeatability error and the run effect. In general, to assess the respective uncertainties it is necessary to use information gathered independently of the laboratory. The most generally useful sources of such information are (i) statistics from collaborative trials (not available in many situations of single-laboratory method validation), (ii) statistics from proficiency tests and (iii) results from the analysis of certified reference materials.

Collaborative trials directly estimate the variance of between-laboratory biases. While there may be theoretical shortcomings in the design of such trials, these variance estimates are appropriate for many practical purposes. Consequently it is always instructive to test single-laboratory validation by comparing the estimates of uncertainty with reproducibility estimates from collaborative trials. If the single-laboratory result is substantially the smaller, it is likely that important sources of uncertainty have been neglected. (Alternatively, it may be that a particular laboratory in fact works to a smaller uncertainty than found in collaborative trials: such a laboratory would have to take special measures to justify such a claim.) If no collaborative trial has been carried out on the particular method/test material combination, an estimate of the reproducibility standard deviation at an analyte concentration c above about 120 ppb can usually be obtained from the Horwitz function, , with both variables expressed as mass fractions. (The Horwitz estimate is normally within a factor of about two of observed collaborative study results). It has been observed that the Horwitz function is incorrect at concentrations lower than about 120 ppb, and a modified function is more appropriate.21, 25 All of this information may be carried into the single-laboratory area with minimum change.

Statistics from proficiency tests are particularly interesting because they provide information in general about the magnitude of laboratory and method biases combined and, for the participant, information about total error on specific occasions. Statistics such as the robust standard deviation of the participants results for an analyte in a round of the test can in principle be used in a way similar to reproducibility standard deviations from collaborative trials, i.e., to obtain a benchmark for overall uncertainty for comparison with individual estimates from single-laboratory validation. In practice, statistics from proficiency tests may be more difficult to access, because they are not systematically tabulated and published like collaborative trials, but only made available to participants. Of course, if such statistics are to be used they must refer to the appropriate matrix and concentration of the analyte. Individual participants in proficiency testing schemes can also gauge the validity of their estimated uncertainty by comparing their reported results with the assigned values of successive rounds26. This, however, is an ongoing activity and therefore not strictly within the purview of single-laboratory validation (which is a one-off event).

If an appropriate certified reference material is available, a single-laboratory test allows a laboratory to assess laboratory bias and method bias in combination, by analysing the CRM a number of times.  The estimate of the combined bias is the difference between the mean result and the certified value.

Appropriate certified reference materials are not always available, so other materials may perforce have to be used. Materials left over from proficiency tests sometimes serve this purpose and, although the assigned values of the materials may have questionable uncertainties, their use certainly provides a check on overall bias. Specifically, proficiency test assigned values are generally chosen to provide a minimally biased estimate, so a test for significant bias against such a material is a sensible practice. A further alternative is to use spiking and recovery information4 to provide estimates of these biases, although there may be unmeasurable sources of uncertainty associated with these techniques.

Currently the least recognised effect in validation is that due to matrix variation within the defined class of test material. The theoretical requirement for the estimation of this uncertainty component is for a representative collection of test materials to be analysed in a single run, their individual biases estimated, and the variance of these biases calculated. (Analysis in a single run means that higher level biases have no effect on the variance. If there is a wide concentration range involved, then allowance for the change in bias with concentration must be made.) If the representative materials are certified reference materials, the biases can be estimated directly as the differences between the results and the reference values, and the whole procedure is straightforward. In the more likely event that insufficient number of certified reference materials are available, recovery tests with a range of typical test materials may be resorted to, with due caution. Currently there is very little quantitative information about the magnitude of uncertainties from this source, although in some instances they are suspected of being large.

  1. Conduct of Validation Studies

The detailed design and execution of method validation studies is covered extensively elsewhere and will not be repeated here. However, the main principles are pertinent and are considered below:

It is essential that validation studies are representative. That is, studies should, as far as possible, be conducted to provide a realistic survey of the number and range of effects operating during normal use of the method, as well as to cover the concentration ranges and sample types within the scope of the method. Where a factor (such as ambient temperature) has varied representatively at random during the course of a precision experiment, for example, the effects of that factor appear directly in the observed variance and need no additional study unless further method optimisation is desirable.

In the context of method validation, “representative variation” means that the factor must take a distribution of values appropriate to the anticipated range of the parameter in question. For continuous measurable parameters, this may be a permitted range, stated uncertainty or expected range; for discontinuous factors, or factors with unpredictable effects such as sample matrix, a representative range corresponds to the variety of types or “factor levels” permitted or encountered in normal use of the method. Ideally, representativeness extends not only to the range of values, but to their distribution. Unfortunately, it is often uneconomic to arrange for full variation of many factors at many levels. For most practical purposes, however, tests based on extremes of the expected range, or on larger changes than anticipated, are an acceptable minimum.

In selecting factors for variation, it is important to ensure that the larger effects are ‘exercised’ as much as possible. For example, where day to day variation (perhaps arising from recalibration effects) is substantial compared to repeatability, two determinations on each of five days will provide a better estimate of intermediate precision than five determinations on each of two days. Ten single determinations on separate days will be better still, subject to sufficient control, though this will provide no additional information on within-day repeatability.

Clearly, in planning significance checks, any study should have sufficient power to detect such effects before they become practically important (that is, comparable to the largest component of uncertainty).

In addition, the following considerations may be important:

Where factors are known or suspected to interact, it is important to ensure that the effect of interaction is accounted for. This may be achieved either by ensuring random selection from different levels of interacting parameters, or by careful systematic design to obtain ‘interaction’ effects or covariance information.

In carrying out studies of overall bias, it is important that the reference materials and values are relevant to the materials under routine test.

  1. Extent of validation studies

The extent to which a laboratory has to undertake validation of a new, modified or unfamiliar method depends to a degree on the existing status of the method and the competence of the laboratory. Suggestions as to the extent of validation and verification measures for different circumstances are given below. Except where stated, it is assumed that the method is intended for routine use.

6.1.  The laboratory is to use a “fully” validated method

The method has been studied in a collaborative trial and so the laboratory has to verify that it is capable of achieving the published performance characteristics of the method (or is otherwise able to fulfil the requirements of the analytical task). The laboratory should undertake precision studies, bias studies (including matrix variation studies), and possibly linearity studies, although some tests such as that for ruggedness may be omitted.

6.2.  The laboratory is to use a fully validated method, but new matrix is to be used

The method has been studied in a collaborative trial and so the laboratory has to verify that the new matrix introduces no new sources of error into the system. The same range of validation as the previous is required.

6.3.  The laboratory is to use a well-established, but not collaboratively studied, method

The same range of validation as the previous is required.

6.4.  The method has been published in the scientific literature together with some analytical characteristics

The laboratory should undertake precision studies, bias studies (including matrix variation studies), ruggedness and linearity studies.

6.5.  The method has been published in the scientific literature with no characteristics given or has been developed in-house

The laboratory should undertake precision studies, bias studies (including matrix variation studies), ruggedness and linearity studies.

6.6.  The method is empirical

An empirical method is one in which the quantity estimated is simply the result found on following the stated procedure. This differs from measurements intended to assess method-independent quantities such as the concentration of a particular analyte in a sample, in that the method bias is conventionally zero, and matrix variation (that is , within the defined class) is irrelevant. Laboratory bias cannot be ignored, but is likely to be difficult to estimate by single-laboratory experiment. Moreover, reference materials are unlikely to be available. In the absence of collaborative trial data some estimate of interlaboratory precision could be obtained from a specially designed ruggedness study or estimated by using the Horwitz function.

6.7.  The analysis is “ad hoc”

“Ad hoc” analysis is occasionally necessary to establish the general range of a value, without great expenditure and with low criticality. The effort that can go into validation is accordingly strictly limited. Bias should be studied by methods such as recovery estimation or analyte additions, and precision by replication.

6.8.  Changes in staff and equipment

Important examples include: change in major instruments; new batches of very variable reagents (for example, polyclonal antibodies); changes made in the laboratory premises; methods used for the first time by new staff; or a validated method employed after a period of disuse. Here the essential action is to demonstrate that no deleterious changes have occurred.  The minimum check is a single bias test; a “before and after” experiment on typical test materials or control materials. In general, the tests carried out should reflect the possible impact of the change on the analytical procedure.

  1. Recommendations

The following recommendations are made regarding the use of single-laboratory method validation:

Wherever possible and practical a laboratory should use a method of analysis that has had its performance characteristics evaluated through a collaborative trial conforming to an international protocol.

Where such methods are not available, a method must be validated in-house before being used to generate analytical data for a customer.

Single-laboratory validation requires the laboratory to select appropriate characteristics for evaluation from the following: applicability, selectivity, calibration, accuracy, precision, range, limit of quantification, limit of detection, sensitivity, ruggedness and practicability. The laboratory must take account of customer requirements in choosing which characteristics are to be determined.

Evidence that these characteristics have been assessed must be made available to customers of the laboratory if required by the customer.

  1. References
  • "Protocol for the Design, Conduct and Interpretation of Method Performance Studies", W Horwitz, Pure Appl. Chem., 1988, 60, 855 864, revised W. Horwitz, Pure Appl. Chem., 1995, 67, 331-343.
  • “The International Harmonised Protocol for the Proficiency Testing of (Chemical) Analytical Laboratories”, M Thompson and R Wood, Pure Appl. Chem., 1993, 65, 2123-2144. (Also published  in J. AOAC International, 1993, 76, 926-940.
  • “Harmonised Guidelines For Internal Quality Control in Analytical Chemistry Laboratories”, Michael Thompson and Roger Wood,  J. Pure & Applied Chemistry, 1995, 67(4), 49-56.
  • “Harmonised Guidelines for the Use of Recovery Information in Analytical Measurement”, Michael Thompson, Stephen Ellison, Ales Fajgelj, Paul Willetts and Roger Wood, J. Pure & Applied Chemistry, 1999, 71(2), 337-348.
  • “Council Directive 93/99/EEC on the Subject of Additional Measures Concerning the Official Control of Foodstuffs”, O. J., 1993, L290.
  • “Procedural Manual of the Codex Alimentarius Commission, 10th Edition”, FAO, Rome, 1997.
  • “Precision of Test Methods”, Geneva, 1994, ISO 5725, Previous editions were issued in 1981 and 1986.
  • “Guide to the Expression of Uncertainty in Measurement”, ISO, Geneva, 1993.
  • “Quantifying Uncertainty in Analytical Measurement”, EURACHEM Secretariat, Laboratory of the Government Chemist, Teddington, UK, 1995, EURACHEM Guide (under revision).
  • “International vocabulary of basic and general terms in metrology” ISO, Geneva 1993
  • “Validation of Chemical Analytical Methods”, NMKL Secretariat, Finland, 1996, NMKL Procedure No. 4.
  • “EURACHEM Guide: The fitness for purpose of analytical methods. A Laboratory Guide to method validation and related topics”, LGC, Teddington 1996. Also available from the EURACHEM Secretariat and website.
  • “Statistics manual of the AOAC”, AOAC INTERNATIONAL, Gaithersburg, Maryland, USA, 1975
  • “An Interlaboratory Analytical Method Validation Short Course developed by the AOAC INTERNATIONAL”, AOAC INTERNATIONAL, Gaithersburg, Maryland, USA, 1996.
  • “Text on validation of analytical procedures” International Conference on Harmonisation. Federal Register, Vol. 60, March 1, 1995, pages 11260
  • “Validation of analytical procedures: Methodology” International Conference on Harmonisation. Federal Register, Vol. 62, No. 96, May 19, 1997, pages 27463-27467.
  • “Validation of Methods”, Inspectorate for Health Protection, Rijswijk, The Netherlands, Report 95-001.
  • “A Protocol for Analytical Quality Assurance in Public Analysts’ Laboratories”, Association of Public Analysts, 342 Coleford Road, Sheffield S9 5PH, UK, 1986.
  • “Validation of Analytical Methods for Food Control”, Report of a Joint FAO/IAEA Expert Consultation, December 1997, FAO Food and Nutrition Paper No. 68, FAO, Rome, 199
  • “Estimation and Expression of Measurement Uncertainty in Chemical Analysis”, NMKL Secretariat, Finland, 1997, NMKL Procedure No. 5.
  • M Thompson, PJ Lowthian, J AOAC Int, 1997, 80, 676-679
  • IUPAC recommendation: “Nomenclature in evaluation of analytical methods, including quantification and detection capabilities” Pure and Applied Chem.” 1995, 67 1699-1723
  • ISO 11843. “Capability of detection.” (Several parts). International Standards Organisation, Geneva.
  • M. Thompson, Analyst, 2000, 125, 2020-2025
  • “Recent trends in inter-laboratory precision at ppb and sub-ppb concentrations in relation to fitness for purpose criteria in proficiency testing” M Thompson, Analyst, 2000, 125, 385-386.
  • “How to combine proficiency test results with your own uncertainty estimate - the zeta score”, Analytical Methods Committee of the Royal Society of Chemistry, AMC Technical Briefs, editor M. Thompson, AMC Technical Brief No. 2, www.rsc.org/lap/rsccom/amc

Annex A: Notes on the requirements for study of method performance characteristics

The general requirements for the individual performance characteristics for a method are as follows.

A.1 Applicability

After validation the documentation should provide, in addition to any performance specification, the following information:

  • the identity of the analyte, including speciation where appropriate (Example: ‘total arsenic’
  • the concentration range covered by the validation (Example: ‘0-50 ppm’);
  • a specification of the range of matrices of the test material covered by the validation (Example: ‘seafood’);
  • a protocol, describing the equipment, reagents, procedure (including permissible variation in specified instructions, e.g., ‘heat at 100  5 for 30  5 minutes’), calibration and quality procedures, and any special safety precautions required;
  • the intended application and its critical uncertainty requirements (Example:  ‘The analysis of food for screening purposes.  The standard uncertainty u(c) of the result c should be less than 0.1c.’).

A.2 Selectivity

Selectivity is the degree to which a method can quantify the analyte accurately in the presence of interferents. Ideally, selectivity should be evaluated for any important interferent likely to be present. It is particularly important to check interferents which are likely, on chemical principles, to respond to the test. For example, colorimetric tests for ammonia might reasonably be expected to respond to primary aliphatic amines. It may be impracticable to consider or test every potential interferent; where that is the case, it is recommended that the likely worst cases are checked. As a general principle, selectivity should be sufficiently good for any interferences to be ignored.

In many types of analysis, selectivity is essentially a qualitative assessment based on the significance or otherwise of suitable tests for interference. However, there are useful quantitative measures. In particular, one quantitative measure is the selectivity index /, where is the sensitivity of the method (slope of the calibration function) and the slope of the response independently produced by a potential interferent, provides a quantitative measure of interference. can be determined approximately by execution of the procedure on a matrix blank and the same blank spiked with the potential interferent at one appropriate concentration. If a matrix blank is unavailable, and a typical material used instead, can be estimated from such a simple experiment only under the assumption that mutual matrix effects are absent. Note that t is more easily determined in the absence of the analyte because the effect might be confused with another type of interference when the sensitivity of the analyte is itself affected by the interferent (a matrix effect).

A.3 Calibration and linearity

With the exception of gross errors in preparation of calibration materials, calibration errors are usually (but not always) a minor component of the total uncertainty budget, and can usually be safely subsumed into various categories estimated by “top-down” methods. For example random errors resulting from calibration are part of the run bias, which is assessed as a whole, while systematic errors from that source may appear as laboratory bias, likewise assessed as a whole. Never-the-less, there are some characteristics of calibration that are useful to know at the outset of method validation, because they affect the strategy for the optimal development of the procedure. In this class are such questions as whether the calibration function plausibly (a) is linear, (b) passes through the origin and (c) is unaffected by the matrix of the test material. The procedures described here relate to calibration studies in validation, which are necessarily more exacting than calibration undertaken during routine analysis. For example, once it is established at validation that a calibration function is linear and passes through the origin, a much simpler calibration strategy can be used for routine use (for example, a two point repeated design). Errors from this simpler calibration strategy will normally be subsumed into higher level errors for validation purposes.

A3.1.       Linearity and intercept

Linearity can be tested informally by examination of a plot of residuals produced by linear regression of the responses on the concentrations in an appropriate calibration set. Any curved pattern suggests lack of fit due to a non-linear calibration function. A test of significance can be undertaken by comparing the lack-of-fit variance with that due to pure error. However, there are causes of lack of fit other than nonlinearity that can arise in certain types of analytical calibration, so the significance test must be used in conjunction with a residual plot. Despite its current widespread use as an indication of quality of fit, the correlation coefficient is misleading and inappropriate as a test for linearity and should not be used.

Design is all-important in tests for lack of fit, because it is easy to confound nonlinearity with drift. Replicate measurements are needed to provide an estimate of pure error if there is no independent estimate. In the absence of specific guidance, the following should apply:

  • there should be six or more calibrators;
  • the calibrators should be evenly spaced over the concentration range of interest;
  • the range should encompass 0-150% or 50-150% of the concentration likely to be encountered, depending on which of these is the more suitable;
  • the calibrators should be run at least in duplicate, and preferably triplicate or more, in a random order.

After an exploratory fit with simple linear regression, the residuals should be examined for obvious patterns. Heteroscedasticity is quite common in analytical calibration and a pattern suggesting it means that the calibration data are best treated by weighted regression. Failure to use weighted regression in these circumstances could give rise to exaggerated errors at the low end of the calibration function.

The test for lack of fit can be carried out with either simple or weighted regression. A test for an intercept significantly different from zero can also be made on this data if there is no significant lack of fit.

A3.2.       Test for general matrix effect

It simplifies calibration enormously if the calibrators can be prepared as a simple solution of the analyte. The effects of a possible general matrix mismatch must be assessed in validation if this strategy is adopted. A test for general matrix effect can be made by applying the method of analyte additions (also called “standard additions”) to a test solution derived from a typical test material. The test should be done in a way that provides the same final dilution as the normal procedure produces, and the range of additions should encompass the same range as the procedure-defined calibration validation. If the calibration is linear the slopes of the usual calibration function and the analyte additions plot can be compared for significant difference. A lack of significance means that there is no detectable general matrix effect. If the calibration is not linear a more complex method is needed for a significance test, but a visual comparison at equal concentrations will usually suffice. A lack of significance in this test will often mean that the matrix variation effect [Section A13] will also be absent.

A3.3.       Final calibration procedure

The calibration strategy as specified in the procedure may also need to be separately validated, although the errors involved will contribute to jointly estimated uncertainties. The important point here is that evaluation uncertainty estimated from the specific designs for linearity etc., will be smaller than those derived from the simpler calibration defined in the procedure protocol.

A.4 Trueness

 

A4.1.       Estimation of trueness

Trueness is the closeness of agreement between a test result and the accepted reference value of the property being measured. Trueness is stated quantitatively in terms of “bias”; with smaller bias indicating greater trueness. Bias is typically determined by comparing the response of the method to a reference material with the known value assigned to the material. Significance testing is recommended. Where the uncertainty in the reference value is not negligible, evaluation of the results should consider the reference material uncertainty as well as the statistical variability.

A4.2.       Conditions for trueness experiments

Bias can arise at different levels of organisation in an analytical system, for example, run bias, laboratory bias and method bias. It is important to remember which of these is being handled by the various methods of addressing bias. In particular:

The mean of a series of analyses of a reference material, carried out wholly within a single run, gives information about the sum of method, laboratory and run effect for that particular run. Since the run effect is assumed to be random from run to run, the result will vary from run to run more than would be expected from the observable dispersion of the results, and this needs to be taken into account in the evaluation of the results (for example, by testing the measured bias against the among-runs standard deviation investigated separately).

The mean of repeated analyses of a reference material in several runs, estimates the combined effect of method and laboratory bias in the particular laboratory (except where the value is assigned using the particular method).

A4.3.       Reference values for trueness experiments

A.4.3.1.                       Certified reference materials (CRMs)

CRMs are traceable to international standards with a known uncertainty and therefore can be used to address all aspects of bias (method, laboratory and within-laboratory) simultaneously, assuming that there is no matrix mismatch. CRMs should accordingly be used in validation of trueness where it is practicable to do so. It is important to ensure that the certified value uncertainties are sufficiently small to permit detection of a bias of important magnitude. Where they are not, the use of CRMs is still recommended, but additional checks should be carried out.

A typical trueness experiment generates a mean response on a reference material. In interpreting the result, the uncertainty associated with the certified value should be taken into account along with the uncertainty arising from statistical variation in the laboratory. The latter term may be based on the within-run, between-run, or an estimate of the between-laboratory standard deviation depending on the intent of the experiment. Statistical or materials. Where the certified value uncertainty is small, a Student’s t test is normally carried out, using the appropriate precision term.

Where necessary and practicable, a number of suitable CRMs, with appropriate matrices and analyte concentrations, should be examined. Where this is done, and the uncertainties on the certified values are smaller than those on the analytical results, it would be reasonably safe to use simple regression to evaluate the results. In this way bias could be expressed as a function of concentration, and might appear as a non-zero intercept (“transitional” or constant bias) or as a non-unity slope (“rotational” or proportional bias). Due caution should be applied in interpreting the results where the range of matrices is large.

A.4.3.2.                       Reference materials

Where CRMs are not available, or as an addition to CRMs, use may be made of any material sufficiently well characterised for the purpose (a reference material10), bearing in mind always that while insignificant bias may not be proof of zero bias, significant bias on any material remains a cause for investigation. Examples of reference materials include: Materials characterised by a reference material producer, but whose values are not accompanied by an uncertainty statement or are otherwise qualified; materials characterised by a manufacturer of the material; materials characterised in the laboratory for use as reference materials; materials subjected to a restricted round-robin exercise, or distributed in a proficiency test. While the traceability of these materials may be questionable, it would be far better to use them than to conduct no assessment for bias at all. The materials would be used in much the same way as CRMs, though with no stated uncertainty any used in much the same way as CRMs, though with no stated uncertainty any significance test relies wholly on the observable precision of results.

A.4.3.3.                       Use of a reference method

A reference method can in principle be used to test for bias in another method under validation. This is a useful option when checking an alternative to, or modification of, an established standard method already validated and in use in the laboratory. Both methods are used to analyse a number of typical test materials, preferably covering a useful range of concentration fairly evenly. Comparison of the results over the range by a suitable statistical method (for example, a paired t-test, with due checks for homogeneity of variance and normality) would demonstrate any bias between the methods.

A.4.3.4.                       Use of spiking/recovery

In the absence of reference materials, or to support reference material studies, bias can be investigated by spiking and recovery. A typical test material is analysed by the method under validation both in its original state and after the addition (spiking) of a known mass of the analyte to the test portion. The difference between the two results as a proportion of the mass added is called the surrogate recovery or sometimes the marginal recovery. Recoveries significantly different from unity indicate that a bias is affecting the method.  Strictly, recovery studies as described here only assess bias due to effects operating on the added analyte; the same effects do not necessarily apply to the same extent to the native analyte, and additional effects may apply to the native analyte. Spiking/recovery studies are accordingly very strongly subject to the observation that while good recovery is not a guarantee of trueness, poor recovery is certainly an indication of lack of trueness. Methods of handling spiking/recovery data have been covered in detail elsewhere.4

A.5 Precision

Precision is the closeness of agreement between independent test results obtained under stipulated conditions. It is usually specified in terms of standard deviation or relative standard deviation. The distinction between precision and bias is fundamental, but depends on the level at which the analytical system is viewed. Thus from the viewpoint of a single determination, any deviation affecting the calibration for the run would be seen as a bias. From the point of view of the analyst reviewing a year’s work, the run bias will be different every day and act like a random variable with an associated precision. The stipulated conditions for the estimation of precision take account of this change in view point.

For single laboratory validation, two sets of conditions are relevant: (a) precision under repeatability conditions, describing variations observed during a single run as expectation 0 and standard deviation , and (b) precision under run-to-run conditions, describing variations in run bias as expectation 0, standard deviation . Usually both of these sources of error are operating on individual analytical results, which therefore have a combined precision , where n is the number of repeat results averaged within a run for the reported result. The two precision estimates can be obtained most simply by analysing the selected test material in duplicate in a number of successive runs. The separate variance components can then be calculated by the application of one-way analysis of variance. Each duplicate analysis must be an independent execution of the procedure applied to a separate test portion. Alternatively the combined precision can be estimated directly by the analysis of the test material once in successive runs, and estimating the standard deviation from the usual equation. (Note that observed standard deviations are generally given the symbol s, to distinguish them from standard deviations σ).

It is important that the precision values are representative of likely test conditions. First, the variation in conditions among the runs must represent what would normally happen in the laboratory under routine use of the method. For instance, variations in reagent batches, analysts and instruments should be representative. Second, the test material used should be typical, in terms of matrix and (ideally) the state of comminution, of the materials likely to encountered in routine application. So actual test materials or, to a lesser degree, matrix-matched reference materials would be suitable, but standard solutions of the analyte would not. Note also that CRMs and prepared reference materials are frequently homogenised to a greater extent than typical test materials, and precision obtained from their analysis may accordingly under-estimate the variation that will be observed for test materials.

Precision very often varies with analyte concentration.  Typical assumptions are i) that there is no change in precision with analyte level, or ii) that the standard deviation is proportional to, or linearly dependent on, analyte level.  In both cases, the assumption needs to be checked if the analyte level is expected to vary substantially (that is, by more than about 30% from its central value).  The most economical experiment is likely to be a simple assessment of precision at or near the extremes of the operating range, together with a suitable statistical test for difference in variance.  The F-test is appropriate for normally distributed error.

Precision data may be obtained for a wide variety of different sets of conditions in addition to the minimum of repeatability and between-run conditions indicated here, and it may be appropriate to acquire additional information. For example, it may be useful to the assessment of results, or for improving the measurement, to have an indication of separate operator and run effects, between or within-day effects or the precision attainable using one or several instruments. A range of different designs and statistical analysis techniques is available, and careful experimental design is strongly recommended in all such studies.

A.6 Recovery

Methods for estimating recovery are discussed in conjunction with methods of estimating trueness (above).

A.7 Range

The validated range is the interval of analyte concentration within which the method can be regarded as validated. It is important to realise that this range is not necessarily identical to the useful range of the calibration. While the calibration may cover a wide concentration range, the remainder of the validation (and usually much more important part in terms of uncertainty) will cover a more restricted range. In practice, most methods will be validated at only one or two levels of concentration. The validated range may be taken as a reasonable extrapolation from these points on the concentration scale.

When the use of the method focuses on a concentration of interest well above the detection limit, validation near that one critical level would be appropriate. It is impossible to define a general safe extrapolation of this result to other concentrations of the analyte, because much depends on the individual analytical system. Therefore the validation study report should state the range around the critical value in which the person carrying out the validation, using professional judgement, regards the estimated uncertainty to hold true.

When the concentration range of interest approaches zero, or the detection limit, it is incorrect to assume either constant absolute uncertainty or constant relative uncertainty. A useful approximation in this common circumstance is to assume a linear functional relationship, with a positive intercept, between uncertainty u and concentration c, that is of the form

where θ is the relative uncertainty estimated a some concentration well above the detection limit. is the standard uncertainty estimated for zero concentration and in some circumstances could be estimated as . In these circumstances it would be reasonable to regard the validated range as extending from zero to a small integer multiple of the upper validation point. Again this would depend on professional judgement.

A.8 Detection Limit

In broad terms the detection limit (limit of detection) is the smallest amount or concentration of analyte in the test sample that can be reliably distinguished from zero.22,23 For analytical systems where the validation range does not include or approach it, the detection limit does not need to be part of a validation.

Despite the apparent simplicity of the idea, the whole subject of the detection limit is beset with problems outlined below:

There are several possible conceptual approaches to the subject, each providing a somewhat different definition of the limit. Attempts to clarify the issue seem ever more confusing.

Although each of these approaches depends on an estimate of precision at or near zero concentration, it is not clear whether this should be taken as implying repeatability conditions or some other condition for the estimation.

Unless an inordinate amount of data is collected, estimates of detection limit will be subject to quite large random variation.

Estimates of detection limit are often biased on the low side because of operational factors.

Statistical inferences relating to the detection limit depend on the assumption of normality, which is at least questionable at low concentrations.

For most practical purposes in method validation, it seems better to opt for a simple definition, leading to a quickly implemented estimation which is used only as a rough guide to the utility of the method. However, it must be recognised that the detection limit as estimated in method development, may not be identical in concept or numerical value to one used to characterise a complete analytical method. For instance the “instrumental detection limit”, as quoted in the literature or in instrument brochures and then adjusted for dilution, is often far smaller than a “practical” detection limit and inappropriate for method validation.

It is accordingly recommended that for method validation, the precision estimate used () should be based on at least 6 independent complete determinations of analyte concentration in a typical matrix blank or low-level material, with no censoring of zero or negative results, and the approximate detection limit calculated as . Note that with the recommended minimum number of degrees of freedom, this value is quite uncertain, and may easily be in error by a factor of two. Where more rigorous estimates are required (for example to support decisions based on detection or otherwise of a material), reference should be made to appropriate guidance (see, for example, references 22-23).

A.9            Limit of determination or limit of quantification

It is sometimes useful to state a concentration below which the analytical method cannot operate with an acceptable precision. Sometimes that precision is arbitrarily defined as 10% RSD, sometimes the limit is equally arbitrarily taken as a fixed multiple (typically 2) of the detection limit. While it is to a degree reassuring to operate above such a limit, we must recognise that it is a quite artificial dichotomy of the concentration scale: measurements below such a limit are not devoid of information content and may well be fit for purpose. Hence the use of this type of limit in validation is not recommended here. It is preferable to try to express the uncertainty of measurement as a function of concentration and compare that function with a criterion of fitness for purpose agreed between the laboratory and the client or end-user of the data.

A.10         Sensitivity

The sensitivity of a method is the gradient of the calibration function. As this is usually arbitrary, depending on instrumental settings, it is not useful in validation. (It may be useful in quality assurance procedures, however, to test whether an instrument is performing to a consistent and satisfactory standard.)

A.11         Ruggedness

The ruggedness of an analytical method is the resistance to change in the results produced by an analytical method when minor deviations are made from the experimental conditions described in the procedure. The limits for experimental parameters should be prescribed in the method protocol (although this has not always been done in the past), and such permissible deviations, separately or in any combination, should produce no meaningful change in the results produced. (A “meaningful change” here would imply that the method could not operate within the agreed limits of uncertainty defining fitness for purpose.) The aspects of the method which are likely to affect results should be identified, and their influence on method performance evaluated by using ruggedness tests.

The ruggedness of a method is tested by deliberately introducing small changes to the procedure and examining the effect on the results. A number of aspects of the method may need to be considered, but because most of these will have a negligible effect it will normally be possible to vary several at once. An economical experiment based on fractional factorial designs has been described by Youden13. For instance, it is possible to formulate an approach utilising 8 combinations of 7 variable factors, that is to look at the effects of seven parameters with just eight analytical results. Univariate approaches are also feasible, where only one variable at a time is changed.

Examples of the factors that a ruggedness test could address are: changes in the instrument, operator, or brand of reagent; concentration of a reagent; pH of a solution; temperature of a reaction; time allowed for completion of a process etc.

A.12         Fitness for Purpose

Fitness for purpose is the extent to which the performance of a method matches the criteria, agreed between the analyst and the end-user of the data, that describe the end-user’s needs. For instance the errors in data should not be of a magnitude that would give rise to incorrect decisions more often than a defined small probability, but they should not be so small that the end-user is involved in unnecessary expenditure. Fitness for purpose criteria could be based on some of the characteristics described in this Annex, but ultimately will be expressed in terms of acceptable total uncertainty.

A.13         Matrix variation

Matrix variation is, in many sectors, one of the most important but least acknowledged sources of error in analytical measurements. When we define the analytical system to be validated by specifying, amongst other things, the matrix of the test material, there may be scope for considerable variation within the defined class. To cite an extreme example, a sample of the class “soil” could be composed of clay, sand, chalk, laterite (mainly and ), peat, etc., or of mixtures of these. It is easy to imagine that each of these types would contribute a unique matrix effect on an analytical method such as atomic absorption spectrometry.  If we have no information about the type of soils we are analysing, there will be an extra uncertainty in the results because of this variable matrix effect.

Matrix variation uncertainties need to be quantified separately, because they are not taken into account elsewhere in the process of validation. The information is acquired by collecting a representative set of the matrices likely to be encountered within the defined class, all with analyte concentrations in the appropriate range. The material are analysed according to the protocol, and the bias in the results estimated.  Unless the test materials are CRMs, the bias estimate will usually have to be undertaken by means of spiking and recovery estimation. The uncertainty is estimated by the standard deviation of the biases. (Note: This estimate will also contain a variance contribution from the repeat analysis. This will have a magnitude if spiking has been used.  If a strict uncertainty budget is required, this term should be deducted from the matrix variation variance to avoid double accounting.)

A.14         Measurement Uncertainty

The formal approach to measurement uncertainty estimation calculates a measurement uncertainty estimate from an equation, or mathematical model. The procedures described as method validation are designed to ensure that the equation used to estimate the result, with due allowance for random errors of all kinds, is a valid expression embodying all recognised and significant effects upon the result. It follows that, with one caveat elaborated further below, the equation or ‘model’ subjected to validation may be used directly to estimate measurement uncertainty. This is done by following established principles, based on the ‘law of propagation of uncertainty’ which, for independent input effects is

where y() is a function of several independent variables .., and ci is a sensitivity coefficient evaluated as , the partial differential of y with respect to . u( and u(y) are standard uncertainties, that is, measurement uncertainties expressed in the form of standard deviations.  Since u(y(,...)) is a function of several separate uncertainty estimates, it is referred to as a combined standard uncertainty.

To estimate measurement uncertainty from the equation y=f(...) used to calculate the result, therefore, it is necessary first, to establish the uncertainties u(xi) in each of the terms etc. and second, to combine these with the additional terms required to represent random effects as found in validation, and finally to take into account any additional effects. In the discussion of precision above, the implied statistical model is

where e is the random error for a particular result. Since and e are known, from the precision experiments, to have standard deviations and respectively, these latter terms (or, strictly, their estimates and ) are the uncertainties associated with these additional terms. Where the individual within-run results are averaged, the combined uncertainty associated with these two terms is (as given previously) . Note that where the precision terms are shown to vary with analyte level, the uncertainty estimate for a given result must employ the precision term appropriate to that level. The basis for the uncertainty estimate accordingly follows directly from the statistical model assumed and tested in validation. To this estimate must be added any further terms as necessary to account for (in particular) inhomogeneity and matrix effect (see section A13). Finally, the calculated standard uncertainty is multiplied by a ‘coverage factor’, k, to provide an expanded uncertainty, that is, “an interval expected to encompass a large fraction of the distribution of values that may be attributed to the measurand”8. Where the statistical model is well established, the distribution known to be normal, and the number of degrees of freedom associated with the estimate is high, k is generally chosen to be equal to 2. The expanded uncertainty then corresponds approximately to a 95% confidence interval.

There is one important caveat to be added here. In testing the assumed statistical model, imperfect tests are perforce used. It has already been noted that these tests can not prove that any effect is identically zero; they can only show that an effect is too small to detect within the uncertainty associated with the particular test for significance. A particularly important example is the test for significant laboratory bias. Clearly, if this is the only test performed to confirm trueness, there must be some residual uncertainty as to whether the method is indeed unbiased or not. It follows that where such uncertainties are significant with respect to the uncertainty calculated so far, additional allowance should be made.

In the case of an uncertain reference value, the simplest allowance is the stated uncertainty for the material, combined with the statistical uncertainty in the test applied. A full discussion is beyond the scope of this text; reference 9 provides further detail. It is, however, important to note that while the uncertainty estimated directly from the assumed statistical model is the minimum uncertainty that can be associated with an analytical result, it will almost certainly be an underestimate; similarly, an expanded uncertainty based on the same considerations and using k=2 will not provide sufficient confidence.

The ISO Guide8 recommends that for increased confidence, rather than arbitrarily adding terms, the value of k should be increased as required. Practical experience suggests that for uncertainty estimates based on a validated statistical model, but with no evidence beyond the validation studies to provide additional confidence in the model, k should not be less than 3. Where there is strong reason to doubt that the validation study is comprehensive, k should be increased further as required.

Annex B. Additional considerations for Uncertainty estimation in validation studies

B.1 Sensitivity analysis

The basic expression used in uncertainty estimation

requires the ‘sensitivity coefficients’ ci. It is common in uncertainty estimation to find that while a given influence factor has a known uncertainty u(), the coefficient ci is insufficiently characterised or not readily obtainable from the equation for the result. This is particularly common where an effect is not included in the measurement equation because it is not normally significant, or because the relationship is not sufficiently understood to justify a correction. For example, the effect of solution temperature on a room temperature extraction procedure is rarely established in detail.

Where it is desired to assess the uncertainty in a result associated with such an effect, it is possible to determine the coefficient experimentally. This is done most simply by changing xi and observing the effect on the result, in a manner very similar to basic ruggedness tests. In most cases, it is sufficient in the first instance to choose at most two values of xi other than the nominal value, and calculate an approximate gradient from the observed results. The gradient then gives an approximate value for . The term .u() can then be determined. (Note that this is one practical method for demonstrating the significance or otherwise of a possible effect on the results).

In such an experiment, it is important that the change in result observed be sufficient for a reliable calculation of . This is difficult to predict in advance. However, given a permitted range for the influence quantity xi, or an expanded uncertainty for the quantity, that is expected to result in insignificant change, it is clearly important to assess from a larger range. It is accordingly recommended that for an influence quantity with an expected range of a, (where a might be, for example, the permitted range, expanded uncertainty interval or 95% confidence interval) the sensitivity experiment employ, where possible, a change of at least 4a to ensure reliable results.

 

B.2 Judgement

It is not uncommon to find that while an effect is recognised and may be significant, it is not always possible to obtain a reliable estimate of uncertainty. In such circumstances, the ISO Guide makes it quite clear that a professionally considered estimate of the uncertainty is to be preferred to neglect of the uncertainty. Thus, where no estimate of uncertainty is available for a potentially important effect, the analyst should make their own best judgement of the likely uncertainty and apply that in estimating the combined uncertainty. Reference 8 gives further guidance on the use of judgement in uncertainty estimation.


[*] Sampling uncertainty in the strict sense of uncertainty due to the preparation of the laboratory sample from the bulk target is excluded from consideration in this document. Uncertainty associated with taking a test portion from the laboratory sample is an inseparable part of measurement uncertainty and is automatically included at various levels of the following analysis.

[+] Many alternative groupings or ‘partitions of error’ are possible and may be useful in studying particular sources of error in more detail or across a different range of situations. For example, the statistical model of ISO 5725 generally combines laboratory and run effects, while the uncertainty estimation procedure in the ISO GUM is well suited to assessing the effects of each separate and measurable influence on the result.

[*] This may not be applicable at concentrations less than 10 times the detection limit.

Recommendations on measurement uncertainty

OIV-MA-AS1-14 Recommendations on measurement uncertainty

Introduction

 

It is important that analysts are aware of the uncertainty associated with each analytical result and estimates of uncertainty. The measurement uncertainty may be derived by a number of procedures. Food analysis laboratories are required to be in control, use collaboratively tested methods when available, and verify their application before taking them into routine use. Such laboratories therefore have available to them a range of analytical data which can be used to estimate their measurement uncertainty.

Terminology

 

The accepted definition for Measurement Uncertainty1 is:

“Parameter, associated with the result of a measurement, that characterises the dispersion of the values that could reasonably be attributed to the measurand.

NOTES:

  1. The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an interval having a stated level of confidence.
  1. Uncertainty of measurement comprises, in general, many components. Some of these components may be evaluated from the statistical distribution of results of a series of measurements and can be characterised by experimental standard deviations. The other components, which can also be characterised by standard deviations, are evaluated from assumed probability distributions based on experience or other information.
  1. It is understood that the result of a measurement is the best estimate of the value of a measurand, and that all components of uncertainty, including those arising from systematic effects. Such as components associated with corrections and reference standards, contribute to the dispersion.”

[It is recognised that the term “measurement uncertainty” is the most widely used term by International Organisations and Accreditation Agencies. However The Codex ALIMENTARIUS Committee on Methods of Analysis and Sampling has commented on a number of occasions that the term “Measurement Uncertainty”

has some negative associations in legal context and so has noted that an alternative, equivalent, term, “measurement reliability”, may be used.]

 

Recommendations

The following recommendations are made to governments:

  1. For OIV purposes the term “measurement uncertainty” or “measurement reliability” shall be used.
  1. The measurement uncertainty or “measurement reliability” associated with all analytical results is to be estimated and must, on request be made available to the user (customer) of the results.
  1. The measurement uncertainty or “measurement reliability” of an analytical result may be estimated in a number of procedures notably those described by ISO1 and EURACHEM2. These documents recommend procedures based on a component-by-component approach, method validation data, internal quality control data and proficiency test data. The need to undertake an estimation of the measurement uncertainty or “Measurement reliability” using the ISO component-by-component approach is not necessary if the other forms of data are available and used to estimate the uncertainty or reliability. In many cases the overall uncertainty may be determined by an inter-laboratory (collaborative) study by a number of laboratories and a number of matrices by the IUPAC/ISO/AOAC INTERNATIONAL3 or by the ISO 5725 Protocols4.

References

 

  • “Guide to the Expression of Uncertainty in Measurement”, ISO, Geneva, 1993.
  • EURACHEM/CITAC Guide Quantifying Uncertainty In Analytical Measurement (Second Edition), EURACHEM Secretariat, HAM, Berlin, 2000. This is available as a free download from http://www.vtt.fi/ket/eurachem.
  • “Protocol for the Design, Conduct and Interpretation of Method Performance Studies”, ed. W. Horwitz, Pure Appl. Chem., 1995, 67, 331-343.
  • “Precision of Test Methods”, Geneva, 1994, ISO 5725, Previous editions were published in 1981 and 1986.

Recommendations related

OIV-MA-AS1-15 Recommendations related to the recovery correction

Recovery

“The OIV recommends the following practice with regards to reporting recovery of analytical results.

  • Analytical results are to be expressed on a recovery corrected basis where appropriate and relevant, and when corrected it has to be stated.
  • If a result has been corrected for recovery, the method by which the recovery was taken into account should be stated. The recovery rate is to be quoted wherever possible.
  • When laying down provisions for standards, it will be necessary to state whether the result obtained by a method used for analysis within conformity checks shall be expressed on a recovery-corrected basis or not.”