01.06.2015 Views

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong><br />

<strong>Claim</strong> <strong>Counts</strong><br />

<strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and<br />

Bonus-Malus Systems<br />

Michel Denuit<br />

Institut de Statistique, Université Catholique de Louvain, Belgium<br />

Xavier Maréchal<br />

Reacfin, Spin-<strong>of</strong>f <strong>of</strong> the Université Catholique de Louvain, Belgium<br />

Sandra Pitrebois<br />

Secura, Belgium<br />

Jean-François Walhin<br />

Fortis, Belgium


<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong><br />

<strong>Claim</strong> <strong>Counts</strong>


<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong><br />

<strong>Claim</strong> <strong>Counts</strong><br />

<strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and<br />

Bonus-Malus Systems<br />

Michel Denuit<br />

Institut de Statistique, Université Catholique de Louvain, Belgium<br />

Xavier Maréchal<br />

Reacfin, Spin-<strong>of</strong>f <strong>of</strong> the Université Catholique de Louvain, Belgium<br />

Sandra Pitrebois<br />

Secura, Belgium<br />

Jean-François Walhin<br />

Fortis, Belgium


Copyright © 2007<br />

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,<br />

West Sussex PO19 8SQ, England<br />

Telephone +44 1243 779777<br />

Email (for orders and customer service enquiries): cs-books@wiley.co.uk<br />

Visit our Home Page on www.wiley.com<br />

All Rights Reserved. No part <strong>of</strong> this publication may be reproduced, stored in a retrieval system or transmitted in<br />

any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under<br />

the terms <strong>of</strong> the Copyright, Designs and Patents Act 1988 or under the terms <strong>of</strong> a licence issued by the Copyright<br />

Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing <strong>of</strong><br />

the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons<br />

Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to<br />

permreq@wiley.co.uk, or faxed to (+44) 1243 770620.<br />

This publication is designed to provide accurate and authoritative information in regard to the subject matter<br />

covered. It is sold on the understanding that the Publisher is not engaged in rendering pr<strong>of</strong>essional services. If<br />

pr<strong>of</strong>essional advice or other expert assistance is required, the services <strong>of</strong> a competent pr<strong>of</strong>essional should be<br />

sought.<br />

Other Wiley Editorial Offices<br />

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA<br />

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA<br />

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany<br />

John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia<br />

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809<br />

John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, ONT, Canada, L5R 4J3<br />

Wiley also publishes its books in a variety <strong>of</strong> electronic formats. Some content that appears in print may not be<br />

available in electronic books.<br />

Anniversary Logo Design: Richard J. Pacifico<br />

Library <strong>of</strong> Congress Cataloging in Publication Data<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong> : <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems /<br />

Michel Denuit [et al.].<br />

p. cm.<br />

Includes bibliographical references and index.<br />

ISBN 978-0-470-02677-9 (cloth)<br />

1. Insurance, Automobile—Rates—Europe. 2. Automobile insurance claims—Europe.<br />

I. Denuit, M. (Michel)<br />

HG9970.2.A25 2007<br />

368 ′ .092094—dc22<br />

2007019885<br />

British Library Cataloguing in Publication Data<br />

A catalogue record for this book is available from the British Library<br />

ISBN-13 978-0-470-02677-9<br />

Typeset in 10/12pt Times by Integra S<strong>of</strong>tware Services Pvt. Ltd, Pondicherry, India<br />

Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire<br />

This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which<br />

at least two trees are planted for each one used for paper production.


Contents<br />

Foreword<br />

Preface<br />

Notation<br />

xiii<br />

xv<br />

xxv<br />

Part I <strong>Modelling</strong> <strong>Claim</strong> <strong>Counts</strong> 1<br />

1 Mixed Poisson Models for <strong>Claim</strong> Numbers 3<br />

1.1 Introduction 3<br />

1.1.1 Poisson <strong>Modelling</strong> for the Number <strong>of</strong> <strong>Claim</strong>s 3<br />

1.1.2 Heterogeneity and Mixed Poisson Model 4<br />

1.1.3 Maximum Likelihood Estimation 4<br />

1.1.4 Agenda 5<br />

1.2 Probabilistic Tools 5<br />

1.2.1 Experiment and Universe 5<br />

1.2.2 Random Events 5<br />

1.2.3 Sigma-Algebra 6<br />

1.2.4 Probability Measure 6<br />

1.2.5 Independent Events 7<br />

1.2.6 Conditional Probability 7<br />

1.2.7 Random Variables and Random Vectors 8<br />

1.2.8 Distribution Functions 8<br />

1.2.9 Independence for Random Variables 9<br />

1.3 Poisson Distribution 10<br />

1.3.1 Counting Random Variables 10<br />

1.3.2 Probability Mass Function 10<br />

1.3.3 Moments 10<br />

1.3.4 Probability Generating Function 11<br />

1.3.5 Convolution Product 12<br />

1.3.6 From the Binomial to the Poisson Distribution 13<br />

1.3.7 Poisson Process 17


vi<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

1.4 Mixed Poisson Distributions 21<br />

1.4.1 Expectations <strong>of</strong> General Random Variables 21<br />

1.4.2 Heterogeneity and Mixture Models 22<br />

1.4.3 Mixed Poisson Process 25<br />

1.4.4 Properties <strong>of</strong> Mixed Poisson Distributions 26<br />

1.4.5 Negative Binomial Distribution 28<br />

1.4.6 Poisson-Inverse Gaussian Distribution 31<br />

1.4.7 Poisson-LogNormal Distribution 33<br />

1.5 Statistical Inference for Discrete Distributions 35<br />

1.5.1 Maximum Likelihood Estimators 35<br />

1.5.2 Properties <strong>of</strong> the Maximum Likelihood Estimators 37<br />

1.5.3 Computing the Maximum Likelihood Estimators with the<br />

Newton–Raphson Algorithm 40<br />

1.5.4 Hypothesis Tests 41<br />

1.6 Numerical Illustration 44<br />

1.7 Further Reading and Bibliographic Notes 46<br />

1.7.1 Mixed Poisson Distributions 46<br />

1.7.2 Survey <strong>of</strong> Empirical Studies Devoted to <strong>Claim</strong> Frequencies 46<br />

1.7.3 Semiparametric Approach 47<br />

2 <strong>Risk</strong> <strong>Classification</strong> 49<br />

2.1 Introduction 49<br />

2.1.1 <strong>Risk</strong> <strong>Classification</strong>, Regression Models and Random Effects 49<br />

2.1.2 <strong>Risk</strong> Sharing in Segmented Tariffs 50<br />

2.1.3 Bonus Hunger and Censoring 51<br />

2.1.4 Agenda 52<br />

2.2 Descriptive Statistics for Portfolio A 52<br />

2.2.1 Global Figures 52<br />

2.2.2 Available Information 52<br />

2.2.3 Exposure-to-<strong>Risk</strong> 53<br />

2.2.4 One-Way Analyses 54<br />

2.2.5 Interactions 58<br />

2.2.6 True Versus Apparent Dependence 59<br />

2.3 Poisson Regression Model 62<br />

2.3.1 Coding Explanatory Variables 62<br />

2.3.2 Loglinear Poisson Regression Model 64<br />

2.3.3 Score 64<br />

2.3.4 Multiplicative Tariff 65<br />

2.3.5 Likelihood Equations 66<br />

2.3.6 Interpretation <strong>of</strong> the Likelihood Equations 67<br />

2.3.7 Solving the Likelihood Equations with the Newton–Raphson<br />

Algorithm 67<br />

2.3.8 Wald Confidence Intervals 69<br />

2.3.9 Testing for Hypothesis on a Single Parameter 69<br />

2.3.10 Confidence Interval for the Expected Annual <strong>Claim</strong> Frequency 70<br />

2.3.11 Deviance 71<br />

2.3.12 Deviance Residuals 72<br />

2.3.13 Testing a Hypothesis on a Set <strong>of</strong> Parameters 72<br />

2.3.14 Specification Error and Robust Inference 72<br />

2.3.15 Numerical Illustration 73


Contents<br />

vii<br />

2.4 Overdispersion 79<br />

2.4.1 Explanation <strong>of</strong> the Phenomenon 79<br />

2.4.2 Interpreting Overdispersion 79<br />

2.4.3 Consequences <strong>of</strong> Overdispersion 80<br />

2.4.4 <strong>Modelling</strong> Overdispersion 80<br />

2.4.5 Detecting Overdispersion 81<br />

2.4.6 Testing for Overdispersion 82<br />

2.5 Negative Binomial Regression Model 83<br />

2.5.1 Likelihood Equations 83<br />

2.5.2 Numerical Illustration 85<br />

2.6 Poisson-Inverse Gaussian Regression Model 86<br />

2.6.1 Likelihood Equations 86<br />

2.6.2 Numerical Illustration 86<br />

2.7 Poisson-LogNormal Regression Model 87<br />

2.7.1 Likelihood Equations 87<br />

2.7.2 Numerical Illustration 88<br />

2.8 <strong>Risk</strong> <strong>Classification</strong> for Portfolio A 89<br />

2.8.1 Comparison <strong>of</strong> Competing models with the Vuong Test 89<br />

2.8.2 Resulting <strong>Risk</strong> <strong>Classification</strong> for Portfolio A 90<br />

2.9 Ratemaking using Panel Data 90<br />

2.9.1 Longitudinal Data 90<br />

2.9.2 Descriptive Statistics for Portfolio B 92<br />

2.9.3 Poisson Regression with Serial Independence 94<br />

2.9.4 Detection <strong>of</strong> Serial Dependence 97<br />

2.9.5 Estimation <strong>of</strong> the Parameters using GEE 101<br />

2.9.6 Maximum Likelihood in the Negative Binomial Model for Panel Data 105<br />

2.9.7 Maximum Likelihood in the Poisson-Inverse Gaussian Model for Panel Data 106<br />

2.9.8 Maximum Likelihood in the Poisson-LogNormal Model for Panel Data 107<br />

2.9.9 Vuong Test 109<br />

2.9.10 Information Criteria 110<br />

2.9.11 Resulting <strong>Classification</strong> for Portfolio B 110<br />

2.10 Further Reading and Bibliographic Notes 111<br />

2.10.1 Generalized Linear Models 111<br />

2.10.2 Nonlinear Effects 112<br />

2.10.3 Zero-Inflated Models 112<br />

2.10.4 Fixed Versus Random Effects 113<br />

2.10.5 Hurdle Models 113<br />

2.10.6 Geographic Ratemaking 114<br />

2.10.7 S<strong>of</strong>tware 116<br />

Part II Basics <strong>of</strong> Experience Rating 119<br />

3 <strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 121<br />

3.1 Introduction 121<br />

3.1.1 From <strong>Risk</strong> <strong>Classification</strong> to Experience Rating 121<br />

3.1.2 <strong>Credibility</strong> Theory 121<br />

3.1.3 Limited Fluctuation Theory 122<br />

3.1.4 Greatest Accuracy <strong>Credibility</strong> 122<br />

3.1.5 Linear <strong>Credibility</strong> 123<br />

3.1.6 Financial Equilibrium 123


viii<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

3.1.7 Combining a Priori and a Posteriori Ratemaking 123<br />

3.1.8 Loss Function 124<br />

3.1.9 Agenda 124<br />

3.2 <strong>Credibility</strong> Models 124<br />

3.2.1 A Simple Introductory Example: the Good Driver / Bad Driver<br />

Model 124<br />

3.2.2 <strong>Credibility</strong> Models Incorporating a Priori <strong>Risk</strong> <strong>Classification</strong> 126<br />

3.3 <strong>Credibility</strong> Formulas with a Quadratic Loss Function 128<br />

3.3.1 Optimal Least-Squares Predictor 128<br />

3.3.2 Predictive Distribution 129<br />

3.3.3 Bayesian <strong>Credibility</strong> Premium 130<br />

3.3.4 Poisson-Gamma <strong>Credibility</strong> Model 131<br />

3.3.5 Predictive Distribution and Bayesian <strong>Credibility</strong> Premium 132<br />

3.3.6 Numerical Illustration 133<br />

3.3.7 Discrete Poisson Mixture <strong>Credibility</strong> Model 135<br />

3.3.8 Discrete Approximations for the Heterogeneous Component 136<br />

3.3.9 Linear <strong>Credibility</strong> 144<br />

3.4 <strong>Credibility</strong> Formulas with an Exponential Loss Function 149<br />

3.4.1 Optimal Predictor 149<br />

3.4.2 Poisson-Gamma <strong>Credibility</strong> Model 151<br />

3.4.3 Linear <strong>Credibility</strong> 152<br />

3.4.4 Numerical Illustration 152<br />

3.5 Dependence in the Mixed Poisson <strong>Credibility</strong> Model 155<br />

3.5.1 Intuitive Ideas 155<br />

3.5.2 Stochastic Order Relations 156<br />

3.5.3 Comparisons <strong>of</strong> Predictive Distributions 156<br />

3.5.4 Positive Dependence Notions 157<br />

3.5.5 Dependence Between Annual <strong>Claim</strong> Numbers 157<br />

3.5.6 Increasingness in the Linear <strong>Credibility</strong> Model 158<br />

3.6 Further Reading and Bibliographic Notes 158<br />

3.6.1 <strong>Credibility</strong> Models 158<br />

3.6.2 <strong>Claim</strong> Count Distributions 159<br />

3.6.3 Loss Functions 159<br />

3.6.4 <strong>Credibility</strong> and Regression Models 159<br />

3.6.5 <strong>Credibility</strong> and Copulas 160<br />

3.6.6 Time Dependent Random Effects 161<br />

3.6.7 <strong>Credibility</strong> and Panel Data Models 162<br />

3.6.8 <strong>Credibility</strong> and Empirical Bayes Methods 163<br />

4 Bonus-Malus Scales 165<br />

4.1 Introduction 165<br />

4.1.1 From <strong>Credibility</strong> to Bonus-Malus Scales 165<br />

4.1.2 The Nature <strong>of</strong> Bonus-Malus Scales 166<br />

4.1.3 Relativities 166<br />

4.1.4 Bonus-Malus Scales and Markov Chains 166<br />

4.1.5 Financial Equilibrium 167<br />

4.1.6 Agenda 167<br />

4.2 <strong>Modelling</strong> Bonus-Malus Systems 168<br />

4.2.1 Typical Bonus-Malus Scales 168<br />

4.2.2 Characteristics <strong>of</strong> Bonus-Malus Scales 169


Contents<br />

ix<br />

4.2.3 Trajectory 170<br />

4.2.4 Transition Rules 171<br />

4.3 Transition Probabilities 172<br />

4.3.1 Definition 172<br />

4.3.2 Transition Matrix 173<br />

4.3.3 Multi-Step Transition Probabilities 174<br />

4.3.4 Ergodicity and Regular Transition Matrix 176<br />

4.4 Long-Term Behaviour <strong>of</strong> Bonus-Malus Systems 176<br />

4.4.1 Stationary Distribution 176<br />

4.4.2 Rolski–Schmidli–Schmidt–Teugels Formula 179<br />

4.4.3 Dufresne Algorithm 182<br />

4.4.4 Convergence to the Stationary Distribution 183<br />

4.5 Relativities with a Quadratic Loss Function 184<br />

4.5.1 Relativities 184<br />

4.5.2 Bayesian Relativities 185<br />

4.5.3 Interaction between Bonus-Malus Systems and a Priori Ratemaking 189<br />

4.5.4 Linear Relativities 191<br />

4.5.5 Approximations 193<br />

4.6 Relativities with an Exponential Loss Function 194<br />

4.6.1 Bayesian Relativities 194<br />

4.6.2 Fixing the Value <strong>of</strong> the Severity Parameter 196<br />

4.6.3 Linear Relativities 196<br />

4.6.4 Numerical Illustration 197<br />

4.7 Special Bonus Rule 200<br />

4.7.1 The Former Belgian Compulsory System 200<br />

4.7.2 Fictitious Levels 200<br />

4.7.3 Determination <strong>of</strong> the Relativities 200<br />

4.7.4 Numerical Illustration 202<br />

4.7.5 Linear Relativities for the Belgian Scale 207<br />

4.8 Change <strong>of</strong> Scale 208<br />

4.8.1 Migration from One Scale to Another 208<br />

4.8.2 Kolmogorov Distance 208<br />

4.8.3 Distances between the Random Effects 209<br />

4.8.4 Numerical Illustration 209<br />

4.9 Dependence in Bonus-Malus Scales 213<br />

4.10 Further Reading and Bibliographic Notes 213<br />

Part III Advances in Experience Rating 217<br />

5 Efficiency and Bonus Hunger 219<br />

5.1 Introduction 219<br />

5.1.1 Pure Premium 219<br />

5.1.2 Statistical Analysis <strong>of</strong> <strong>Claim</strong> Costs 219<br />

5.1.3 Large <strong>Claim</strong>s and Extreme Value Theory 220<br />

5.1.4 Measuring the Efficiency <strong>of</strong> the Bonus-Malus Scales 220<br />

5.1.5 Bonus Hunger and Optimal Retention 220<br />

5.1.6 Descriptive Statistics for Portfolio C 221<br />

5.2 <strong>Modelling</strong> <strong>Claim</strong> Severities 222<br />

5.2.1 <strong>Claim</strong> Severities in Motor Third Party Liability Insurance 222


x<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

5.2.2 Determining the Large <strong>Claim</strong>s with Extreme Value Theory 223<br />

5.2.3 Generalized Pareto Fit to the Costs <strong>of</strong> Large <strong>Claim</strong>s 227<br />

5.2.4 <strong>Modelling</strong> the Number <strong>of</strong> Large <strong>Claim</strong>s 229<br />

5.2.5 <strong>Modelling</strong> the Costs <strong>of</strong> Moderate <strong>Claim</strong>s 230<br />

5.2.6 Resulting Price List for Portfolio C 236<br />

5.3 Measures <strong>of</strong> Efficiency for Bonus-Malus Scales 240<br />

5.3.1 Loimaranta Efficiency 240<br />

5.3.2 De Pril Efficiency 242<br />

5.4 Bonus Hunger and Optimal Retention 246<br />

5.4.1 Correcting the Estimations for Censoring 246<br />

5.4.2 Number <strong>of</strong> <strong>Claim</strong>s and Number <strong>of</strong> Accidents 249<br />

5.4.3 Lemaire Algorithm for the Determination <strong>of</strong> Optimal Retention Limits 251<br />

5.5 Further Reading and Bibliographic Notes 255<br />

5.5.1 <strong>Modelling</strong> <strong>Claim</strong> Amounts in Related Coverages 255<br />

5.5.2 Tweedie Generalized Linear Model 255<br />

5.5.3 Large <strong>Claim</strong>s 256<br />

5.5.4 Alternative Approaches to <strong>Risk</strong> <strong>Classification</strong> 257<br />

5.5.5 Efficiency 257<br />

5.5.6 Optimal Retention Limits and Bonus Hunger 257<br />

6 Multi-Event Systems 259<br />

6.1 Introduction 259<br />

6.2 Multi-Event <strong>Credibility</strong> Models 260<br />

6.2.1 Dichotomy 260<br />

6.2.2 Multivariate <strong>Claim</strong> Count Model 260<br />

6.2.3 Bayesian <strong>Credibility</strong> Approach 261<br />

6.2.4 Summary <strong>of</strong> Past <strong>Claim</strong>s Histories 262<br />

6.2.5 Variance-Covariance Structure <strong>of</strong> the Random Effects 263<br />

6.2.6 Variance-Covariance Structure <strong>of</strong> the Annual <strong>Claim</strong> Numbers 263<br />

6.2.7 Estimation <strong>of</strong> the Variances and Covariances 264<br />

6.2.8 Linear <strong>Credibility</strong> Premiums 264<br />

6.2.9 Numerical Illustration for Portfolio A 268<br />

6.3 Multi-Event Bonus-Malus Scales 270<br />

6.3.1 Types <strong>of</strong> <strong>Claim</strong>s 270<br />

6.3.2 Markov <strong>Modelling</strong> for the Multi-Event Bonus-Malus Scale 273<br />

6.3.3 Determination <strong>of</strong> the relativities 274<br />

6.3.4 Numerical Illustrations 274<br />

6.4 Further Reading and Bibliographic Notes 276<br />

7 Bonus-Malus Systems with Varying Deductibles 277<br />

7.1 Introduction 277<br />

7.2 Distribution <strong>of</strong> the Annual Aggregate <strong>Claim</strong>s 278<br />

7.2.1 <strong>Modelling</strong> <strong>Claim</strong> Costs 278<br />

7.2.2 Discretization 279<br />

7.2.3 Panjer Algorithm 281<br />

7.3 Introducing a Deductible Within a Posteriori Ratemaking 284<br />

7.3.1 Annual Deductible 284<br />

7.3.2 Per <strong>Claim</strong> Deductible 285<br />

7.3.3 Mixed Case 285


Contents<br />

xi<br />

7.4 Numerical Illustrations 286<br />

7.4.1 <strong>Claim</strong> Frequencies 286<br />

7.4.2 <strong>Claim</strong> Severities 286<br />

7.4.3 Annual Deductible 287<br />

7.4.4 Per <strong>Claim</strong> Deductible 288<br />

7.4.5 Annual Deductible in the Mixed Case 289<br />

7.4.6 Per <strong>Claim</strong> Deductible in the Mixed Case 289<br />

7.5 Further Reading and Bibliographic Notes 290<br />

8 Transient Maximum Accuracy Criterion 293<br />

8.1 Introduction 293<br />

8.1.1 From Stationary to Transient Distributions 293<br />

8.1.2 A Practical Example: Creating a Special Scale for New Entrants 293<br />

8.1.3 Agenda 295<br />

8.2 Transient Behaviour and Convergence <strong>of</strong> Bonus-Malus Scales 295<br />

8.3 Quadratic Loss Function 297<br />

8.3.1 Transient Maximum Accuracy Criterion 297<br />

8.3.2 Linear Scales 302<br />

8.3.3 Financial Balance 305<br />

8.3.4 Choice <strong>of</strong> an Initial Level 307<br />

8.4 Exponential Loss Function 308<br />

8.5 Numerical Illustrations 308<br />

8.5.1 Scale −1/Top 308<br />

8.5.2 −1/+2 Scale 315<br />

8.6 Super Bonus Level 319<br />

8.6.1 Mechanism 319<br />

8.6.2 Initial Distributions 319<br />

8.6.3 Transient Relativities 319<br />

8.7 Further Reading and Bibliographic Notes 323<br />

9 <strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 325<br />

9.1 Introduction 325<br />

9.2 French Bonus-Malus System 326<br />

9.2.1 <strong>Modelling</strong> <strong>Claim</strong> Frequencies 326<br />

9.2.2 Probability Generating Functions <strong>of</strong> Random Vectors 327<br />

9.2.3 CRM Coefficients 327<br />

9.2.4 Computation <strong>of</strong> the CRMs at Time t 328<br />

9.2.5 Global CRM 329<br />

9.2.6 Multivariate Panjer and De Pril Recursive Formulas 330<br />

9.2.7 Analysis <strong>of</strong> the Financial Equilibrium <strong>of</strong> the French Bonus-Malus System 333<br />

9.2.8 Numerical Illustration 335<br />

9.3 Partial Liability 338<br />

9.3.1 Reduced Penalty and <strong>Modelling</strong> <strong>Claim</strong> Frequencies 338<br />

9.3.2 Computations <strong>of</strong> the CRMs at Time t 338<br />

9.3.3 Financial Equilibrium 340<br />

9.3.4 Numerical Illustrations 341<br />

9.4 Further Reading and Bibliographic Notes 342<br />

Bibliography 345<br />

Index 355


Foreword<br />

Belgium has a long and distinguished history in actuarial science. One <strong>of</strong> its leading centres<br />

in the area is the Institut des Sciences Actuarielles at l’Université Catholique de Louvain<br />

(UCL). Since its tender beginnings in the 1970s, the Institute has grown to critical mass and<br />

now boasts an internationally renowned faculty conducting research and education in a broad<br />

range <strong>of</strong> actuarial subjects – newish ones in the interface <strong>of</strong> insurance and finance as well as<br />

more traditional ones that used to form the core <strong>of</strong> insurance mathematics. Among the latter<br />

is risk classification and experience rating in general insurance, which is the subject matter<br />

<strong>of</strong> the present book. This is an area <strong>of</strong> applied statistics that has been fetching tools from<br />

various kits <strong>of</strong> theoretical statistics, notably empirical Bayes, regression, and (generalized)<br />

linear models. However, the complexity <strong>of</strong> the typical application, featuring unobservable<br />

risk heterogeneity, imbalanced design, and nonparametric distributions, inspired independent<br />

fundamental research under the label ‘credibility theory’, now a cornerstone in contemporary<br />

insurance mathematics. Quite naturally, the present book is a tribute to Florian (Etienne) De<br />

Vylder, who was a Pr<strong>of</strong>essor at UCL and one <strong>of</strong> the greatest minds in insurance mathematics<br />

and, in particular, credibility theory. The book grew out <strong>of</strong> years <strong>of</strong> studies by a collective<br />

<strong>of</strong> researchers based at UCL and its industrial environment. The lead author, Michel Denuit,<br />

is one <strong>of</strong> the most prolific researchers in contemporary actuarial science, who has publicized<br />

widely in actuarial and statistical journals on topics in risk theory and actuarial science and<br />

related basic disciplines. Also Jean-François Walhin is a well established researcher with a<br />

long list <strong>of</strong> publications in the scientific actuarial press. Together with their (even) younger<br />

co-authors Xavier Maréchal and Sandra Pitrebois, they have formed a team that is well<br />

placed to write a comprehensive reference text on risk classification and premium rating.<br />

Their combined expertise covers all theory areas that are at the base <strong>of</strong> the topic – risk theory<br />

and insurance mathematics, but also modern statistics and scientific computation. The team’s<br />

total contribution to theoretical research in the subject matter <strong>of</strong> the book is substantial, and<br />

it is merged with sound practical insights gained through commitment to applicability and<br />

also through career experience outside the purely academic walks <strong>of</strong> life.<br />

The book will be welcomed by practitioners and researchers who need a broad introduction<br />

to the titular subject area or an update aided by modern statistical methodology for complex


xiv<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

models and high-dimensional data. The book may also serve as a textbook at graduate level<br />

further to an introduction to basic principles explained in simple models.<br />

The opening Chapters 1 and 2 present basic notions <strong>of</strong> risk and risk characteristics and<br />

their theoretical representation in stochastic models with fixed and random effects and more<br />

or less specified classes <strong>of</strong> distributions. Anticipating the orientation <strong>of</strong> the book, emphasis is<br />

placed on parametric models for the number <strong>of</strong> claims. This gives clarity to the exposition and<br />

also sets a suitable framework for discussion <strong>of</strong> model choice and model calibration that goes<br />

way beyond what is usually found in conventional tutorials. Poisson conditional distributions<br />

with varying exposures are merged with different mixing distributions on the individual<br />

proportional hazards, and there are extensions to generalized linear (regression) models, time<br />

trends, and spatial patterns. Statistical calibration is carried out with maximum likelihood<br />

methods but also with alternative schemes like generalized estimating functions. Ample<br />

numerical examples with authentic data gives real life to the theoretical ideas throughout.<br />

<strong>Claim</strong> counts remain a main theme, but the remainder <strong>of</strong> the book nevertheless presents<br />

a wealth <strong>of</strong> material, partly based on recent research by the authors: credibility theory,<br />

Bayes estimation with exponential loss, bonus-malus systems in a number <strong>of</strong> variations,<br />

elements <strong>of</strong> heavy-tailed distributions, bonus hunger and other ‘behavioural’ problems related<br />

to individual experience rating, optimal design <strong>of</strong> bonus-malus systems for aggregates <strong>of</strong><br />

sub-portfolios, and much more. The final chapter is devoted to a carefully conducted case<br />

study <strong>of</strong> the French bonus-malus system.<br />

I would like to thank the authors for soliciting my views on a draft version <strong>of</strong> the book and<br />

for inviting my preface to their work. Most <strong>of</strong> all I would like to thank them for undertaking<br />

the formidable task <strong>of</strong> collecting and making accessible to a wide readership an area <strong>of</strong><br />

actuarial science that has undergone great changes over the past few decades while remaining<br />

essential to decision making in insurance.<br />

Ragnar Norberg<br />

London, March 2007


Preface<br />

Motor Insurance<br />

This book is devoted to the analysis <strong>of</strong> the number <strong>of</strong> claims filed by an insured driver over<br />

time. Property and liability motor vehicle coverage is broadly divided into first and third<br />

party coverage. First party coverage provides protection in the event the vehicle owner is<br />

responsible for the accident and protects him and his property. Third party coverage provides<br />

protection in the event the vehicle owner causes harm to another party, who recovers their<br />

cost from the policyholder. First party coverages may include first party injury benefits such<br />

as medical expenses, death payments and comprehensive coverages.<br />

A third party liability coverage is required in most countries for a vehicle to be allowed<br />

on the public road network. The compulsory motor third party liability insurance represents<br />

a considerable share <strong>of</strong> the yearly nonlife premium collection in developed countries. This<br />

share becomes even more prominent when first party coverages are considered (such as<br />

medical benefits, uninsured or underinsured motorist coverage, and collision and other than<br />

collision insurance). Moreover, large data bases recording policyholders’ characteristics as<br />

well as claim histories are maintained by insurance companies. The economic importance<br />

and the availability <strong>of</strong> detailed information explain why a large body <strong>of</strong> the nonlife actuarial<br />

literature is devoted to this line <strong>of</strong> business.<br />

Tort System Versus No Fault System<br />

The liability insurance provides coverage to the policyholder if, as the driver <strong>of</strong> a covered<br />

vehicle, the policyholder injures a third party’s property. If the policyholder is sued with<br />

respect to negligence for such bodily injury or property damage, the insurer will provide<br />

legal defense for the policyholder. If the policyholder is found to be liable, the insurer will<br />

pay, on behalf <strong>of</strong> the policyholder, damages assessed against the policyholder.<br />

In the tort system, the insurer indemnifies the claim only if it believes the insured was at<br />

fault in the accident or the third party sues the insured and proves that he/she was at fault<br />

in the accident. Needless to say, a large part <strong>of</strong> the premium income is consumed by legal<br />

fees, court costs and insurers’ administration expenses in such a system. Because <strong>of</strong> this,<br />

several North-American jurisdictions have implemented a no-fault motor insurance system.


xvi<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Even in a pure no-fault motor environment, the police still ask which driver was at fault (or<br />

the degrees to which the drivers shared the fault) because at-fault events cause the insurance<br />

premium to rise at the next policy renewal.<br />

Insurance Ratemaking<br />

Cost-based pricing <strong>of</strong> individual risks is a key actuarial ratemaking principle. The price<br />

charged to policyholders is an estimate <strong>of</strong> the future costs related to the insurance coverage.<br />

The pure premium approach defines the price <strong>of</strong> an insurance policy as the ratio <strong>of</strong> the<br />

estimated costs <strong>of</strong> all future claims against the coverage provided by the insurance policy<br />

while it is in effect to the risk exposure, plus expenses.<br />

The property/casualty ratemaking is based on a claim frequency distribution and a loss<br />

distribution. The claim frequency is defined as the number <strong>of</strong> incurred claims per unit<br />

<strong>of</strong> earned exposure. The exposure is measured in car-year for motor third party liability<br />

insurance (the rate manual lists rates per car-year). The average loss severity is the average<br />

payment per incurred claim. Under mild conditions, the pure premium is then the product<br />

<strong>of</strong> the average claim frequency times the average loss severity. The loss models for motor<br />

insurance are reviewed in Chapters 1–2 (frequency part) and 5 (claim amounts).<br />

In liability insurance, the settlement <strong>of</strong> larger claims <strong>of</strong>ten requires several years. Much <strong>of</strong><br />

the data available for the recent accident years will therefore be incomplete, in the sense that<br />

the final claim cost will not be known. In this case, loss development factors can be used to<br />

obtain a final cost estimate. The average loss severity is then based on incurred loss data. In<br />

contrast to paid loss data (that are purely objective, representing the actual payments made<br />

by the company), incurred loss data include subjective reserve estimates. The actuary has to<br />

carefully analyse the large claims since they represent a considerable share <strong>of</strong> the insurer’s<br />

yearly expenses. This issue will be discussed in Chapter 5, where incurred loss data will be<br />

analysed and appropriately modelled.<br />

<strong>Risk</strong> <strong>Classification</strong><br />

Nowadays, it has become extremely difficult for insurance companies to maintain cross<br />

subsidies between different risk categories in a competitive market. If, for instance, females<br />

are proved to cause significantly fewer accidents than males and if a company disregarded<br />

this variable and charged an average premium to all policyholders regardless <strong>of</strong> gender, most<br />

<strong>of</strong> its female policyholders would be tempted to move to another company <strong>of</strong>fering better<br />

rates to female drivers. The former company is then left with a disproportionate number <strong>of</strong><br />

male policyholders and insufficient premium income to pay for the claims.<br />

To avoid lapses in a competitive market, actuaries have to design a tariff structure that<br />

will fairly distribute the burden <strong>of</strong> claims among policyholders. The policies are partitioned<br />

into classes with all policyholders belonging to the same class paying the same premium.<br />

Each time a competitor uses an additional rating factor, the actuary has to refine the partition<br />

to avoid losing the best drivers with respect to this factor. This explains why so many factors<br />

are used by insurance companies: this is not required by actuarial theory, but instead by<br />

competition among insurers.<br />

In a free market, insurance companies need to use a rating structure that matches the<br />

premiums for the risks as closely as possible, or at least as closely as the rating structures used


Preface<br />

xvii<br />

by competitors. This entails using virtually every available classification variable correlated<br />

to the risks, since failing to do so would mean sacrificing the chance to select against<br />

competitors, and incurring the risk <strong>of</strong> suffering adverse selection by them. It is thus the<br />

competition between insurers that leads to more and more partitioned portfolios, and not<br />

actuarial science. This trend towards more risk classification <strong>of</strong>ten causes social disasters:<br />

bad drivers (or more precisely, drivers sharing the characteristics <strong>of</strong> bad drivers) do not find<br />

a coverage for a reasonable price, and are tempted to drive without insurance. Note also that<br />

even if a correlation exists between the rating factor and the risk covered by the insurer,<br />

there may be no causal relationship between that factor and risk. Requiring that insurance<br />

companies establish such a causal relationship to be allowed to use a rating factor is subject<br />

to debate.<br />

Property and liability motor vehicle insurers use classification plans to create risk classes.<br />

The classification variables introduced to partition risks into cells are called a priori variables<br />

(as their values can be determined before the policyholder starts to drive). Premiums for<br />

motor liability coverage <strong>of</strong>ten vary by the territory in which the vehicle is garaged, the use<br />

<strong>of</strong> the vehicle (driving to and from work or business use) and individual characteristics (such<br />

as age, gender, occupation and marital status <strong>of</strong> the main driver <strong>of</strong> the vehicle, for instance,<br />

if not precluded by legislation or regulatory rules). If the policyholders misrepresent any <strong>of</strong><br />

these classification variables in their declaration, they are subject to loss <strong>of</strong> coverage when<br />

they are involved in a claim. There is thus a strong incentive for accurate reporting <strong>of</strong> risk<br />

characteristics.<br />

As explained in Chapter 2, it is convenient to achieve a priori classification with the<br />

help <strong>of</strong> generalized regression models. The method can be roughly summarized as follows:<br />

One risk classification cell is chosen as the base cell. It normally has the largest amount<br />

<strong>of</strong> exposure. The rate for the base cell is referred to as the base rate. Other rate cells are<br />

defined by a variety <strong>of</strong> risk classification variables, such as territory and so on. For each<br />

risk classification variable, there is a vector <strong>of</strong> differentials, with the base cell characteristics<br />

always assigned 100 %.<br />

In this book, we make extensive use <strong>of</strong> the generalized linear models (better known<br />

under the acronym GLM) developed after Nelder & Wedderburn (1972). These authors<br />

discovered that regression models with a response distribution belonging to the exponential<br />

family <strong>of</strong> probability distributions shared the same characteristics. Members <strong>of</strong> this family<br />

include Normal, Binomial, Poisson, Gamma and Inverse Gaussian distributions that have<br />

been widely used by actuaries to model the number <strong>of</strong> claims, or their severities. Working<br />

in the exponential family allows the actuary to relax the very restrictive hypotheses behind<br />

the Normal linear regression model, namely:<br />

• the response variable takes on the theoretical shape <strong>of</strong> a Normal distribution;<br />

• the variance is constant over individuals;<br />

• the fitted values are obtained from linear combinations <strong>of</strong> the explanatory variables (called<br />

linear predictors, or scores).<br />

Specifically, the Normal distribution can be replaced with another member <strong>of</strong> the exponential<br />

family, heteroscedasticity can be allowed for, and fitted values can be obtained from a<br />

nonlinear transformation (called the link function) <strong>of</strong> linear predictors. Efficient algorithms


xviii<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

are available under most statistical packages to estimate the regression parameters by<br />

maximum likelihood.<br />

Pay-As-You-Drive System<br />

Every kilometer travelled by a vehicle transfers risk to its insurer: the total cost <strong>of</strong> the<br />

coverage thus increases kilometer by kilometer. This is why several authors, including<br />

Butler (1993) suggested charging a cents-per-kilometer based premium; the car-kilometer<br />

should be adopted as the exposure unit instead <strong>of</strong> the car-year that is currently used. Motor<br />

insurance companies are adopting a new scheme called ‘pay as you drive’ (henceforth<br />

referred to as PAYD for the sake <strong>of</strong> brevity). Under a PAYD system, a driver pays for every<br />

kilometer driven at a rate varying from a premium to use the busiest roads at peak hours to<br />

a lower rate for the rural roads.<br />

Several insurance companies (including the pioneering company Norwich Union,<br />

http://www.norwichunion.com/pay-as-you-drive/) have now started to <strong>of</strong>fer a motor<br />

insurance policy under a PAYD system after successful pilot schemes involving thousands <strong>of</strong><br />

motorists. With PAYD systems, drivers are provided with in-car Global Positioning System<br />

(GPS) devices coupled with maps, enabling the insurance company to calculate insurance<br />

premiums for each journey, depending on time <strong>of</strong> day, type <strong>of</strong> road and distance travelled.<br />

A ‘black box’ is installed in the car and receives signals from GPS technology to determine<br />

the vehicle’s current position, speed, and time and direction driven. The black box then<br />

acts as a wireless modem to transmit these inputs through standard mobile phone networks<br />

to the insurer. The insurer sends a monthly bill to the customer based on vehicle usage,<br />

including time <strong>of</strong> day, type <strong>of</strong> road and distance travelled. Historical data then provide<br />

detailed information <strong>of</strong> how, when and where cars are actually used, and whether accidents<br />

and claims can be identified with particular factors. Moreover, the tracker detects speed<br />

infringements and more generally, the aggressiveness behind the wheel. Dangerous driving<br />

habits could lead to higher premiums for car insurance, increasing road safety. In addition<br />

to the static measures <strong>of</strong> risk, such as the driver’s age, dynamic measures, such as speed,<br />

time <strong>of</strong> day, and location, are used to give the best possible overall risk assessment.<br />

The generalization <strong>of</strong> the PAYD system is also expected to change motorists’ attitudes:<br />

like petrol, insurance is bought on a pay-as-you-drive basis, and people think <strong>of</strong> their<br />

insurance costs as related to their actual use <strong>of</strong> their vehicle. Several North-American studies<br />

demonstrate that PAYD systems could reduce motoring by more than 10 %. The PAYD<br />

rating system is expected to decrease congestion and pollution (since the busier roads usually<br />

attract the higher rates).<br />

Experience Rating<br />

The trend towards more classification factors has lead the supervising authorities to exclude<br />

from the tariff structure certain risk factors, even though they may be significantly correlated<br />

to losses. Many states consider banning classification based on items that are beyond the<br />

control <strong>of</strong> the insured, such as gender or age. The resulting inadequacies <strong>of</strong> the a priori<br />

rating system can be corrected for by using the past number <strong>of</strong> claims to reevaluate future<br />

premiums. This is much in line with the concept <strong>of</strong> fairness: as it will be seen from Chapter 2,<br />

a priori ratemaking penalizes individuals who ‘look like’ bad drivers (even if they are in


Preface<br />

xix<br />

reality excellent drivers who will never cause any accident) whereas experience rating uses<br />

the individual claim record to adjust the amount <strong>of</strong> premium. <strong>Actuarial</strong> credibility models<br />

make a balance between the likelihood <strong>of</strong> being an unlucky good driver (who suffered a<br />

claim) and the likelihood <strong>of</strong> being a truly bad driver (who should suffer an increase in<br />

the premium paid to the insurance company for coverage). It seems fair to correct the<br />

inadequacies <strong>of</strong> the a priori system by using an adequate experience rating plan; such a<br />

‘crime and punishment’ system may be more acceptable to policyholders than seemingly<br />

arbitrary a priori classifications.<br />

Moreover, many important factors cannot be taken into account in the a priori risk<br />

classification. Think for instance <strong>of</strong> swiftness <strong>of</strong> reflexes, drinking habits or respect for the<br />

highway code. Consequently, tariff cells are still quite heterogeneous despite the use <strong>of</strong> many<br />

classification variables. This heterogeneity can be modelled by a random effect in a statistical<br />

model. It is reasonable to believe that the hidden characteristics are partly revealed by the<br />

number <strong>of</strong> claims reported by the policyholders. Several empirical studies have shown that,<br />

if insurers were allowed to use only one rating variable, it should be some form <strong>of</strong> merit<br />

rating: the best predictor <strong>of</strong> the number <strong>of</strong> claims incurred by a driver in the future is not<br />

age or vehicle type but past claims history. Hence the adjustment <strong>of</strong> the premium from the<br />

individual claims experience in order to restore fairness among policyholders as explained<br />

in Chapter 3. In that respect, the allowance <strong>of</strong> past claims in a rating model derives from an<br />

exogeneous explanation <strong>of</strong> serial correlation for longitudinal data. In this case, correlation is<br />

only apparent and results from the revelation <strong>of</strong> hidden features in the risk characteristics.<br />

It is worth mentioning that serial correlation for claim numbers can also receive an<br />

endogeneous explanation. In this framework, the history <strong>of</strong> individuals modifies the risk they<br />

represent; this mechanism is termed ‘true contagion’, referring to epidemiology. For instance,<br />

a car accident may modify the perception <strong>of</strong> danger behind the wheel and lower the risk <strong>of</strong><br />

reporting another claim in the future. Experience rating schemes also provide incentives to<br />

careful driving and should induce negative contagion. Nevertheless, the main interpretation<br />

for automobile insurance is exogeneous, since positive contagion (that is, policyholders who<br />

reported claims in the past being more likely to produce claims in the future than those<br />

who did not) is always observed for numbers <strong>of</strong> claims, whereas true contagion should be<br />

negative.<br />

Bonus-Malus Systems<br />

In many European and Asian countries, as well as in North-American states or provinces,<br />

insurers use experience rating in order to relate premium amounts to individual past claims<br />

experience in motor insurance. Such systems penalize insured drivers responsible for one<br />

or more accidents by premium surcharges (or maluses) and reward claim-free policyholders<br />

by awarding them discounts (or bonuses). Such systems are called no-claim discounts,<br />

experience rating, merit rating, or bonus-malus systems.<br />

Discounts for claim-free driving have been awarded in the United Kingdom as early<br />

as 1910. At that time, they were intended as an inducement to renew a policy with the<br />

same company rather than as a reward for prudent driving. The first theoretical treatments<br />

<strong>of</strong> bonus-malus systems were provided in the pioneering works <strong>of</strong> Grenander (1957a,b).<br />

The first ASTIN colloquium held in France in 1959 was exclusively devoted to no-claim<br />

discounts in insurance, with particular reference to motor business.


xx<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

There are various bonus-malus systems used around the world. A typical form <strong>of</strong> no-claim<br />

bonus in the United Kingdom is as follows:<br />

one claim-free year 25 % discount<br />

two claim-free years 40 % discount<br />

three claim-free years 50 % discount<br />

four claim-free years 60 % discount.<br />

Drivers earn an extra year <strong>of</strong> bonus for each year they remain without claims at fault up to a<br />

maximum <strong>of</strong> four years, but lose two years bonus each time they report a claim at fault. In<br />

such a system, maximum bonus is achieved in only a few years and the majority <strong>of</strong> mature<br />

drivers have maximum bonus.<br />

Bonus-malus systems used in Continental Europe are <strong>of</strong>ten more elaborate. Bonus-malus<br />

scales consist <strong>of</strong> a finite number <strong>of</strong> levels, each with its own relativity (or relative premium).<br />

The amount <strong>of</strong> premium paid by a policyholder is then the product <strong>of</strong> a base premium with<br />

the relativity corresponding to the level occupied in the scale. New policyholders have access<br />

to a specified level. After each year, the policy moves up or down according to transition<br />

rules <strong>of</strong> the bonus-malus system. If a bonus-malus system is in force, all policies in the same<br />

tariff class are partitioned according to the level they occupy in the bonus-malus scale. In<br />

this respect, the bonus-malus mechanism can be considered as a refinement <strong>of</strong> a priori risk<br />

evaluation splitting each risk class into a number <strong>of</strong> subcategories according to individual<br />

past claims histories.<br />

As explained in Chapter 4, bonus-malus systems can be modelled using (conditional)<br />

Markov chains provided they possess a certain memoryless property that can be summarized<br />

as follows: the knowledge <strong>of</strong> the present level and <strong>of</strong> the number <strong>of</strong> claims <strong>of</strong> the present<br />

year suffices to determine the level to which the policy is transferred. In other words, the<br />

bonus-malus system satisfies the famous Markov property: the future (the level for year t +1)<br />

depends on the present (the level for year t and the number <strong>of</strong> accidents reported during year<br />

t) and not on the past (the claim history and the levels occupied during years 1 2t−1).<br />

This allows us to determine the optimal relativities in Chapter 4 using an asymptotic criterion<br />

based on the stationary distribution, and in Chapter 8 using transient distributions. Several<br />

performance measures for bonus-malus systems are reviewed in Chapters 5 and 8.<br />

During the 20th century, most European countries imposed a uniform bonus-malus system<br />

on all the companies operating in their territory. In 1994, the European Union decreed that<br />

all its member countries must drop their mandatory bonus-malus systems, claiming that such<br />

systems reduced competition between insurers and were in contradiction to the total rating<br />

freedom implemented by the Third Directive. Since that date, Belgium, for instance, dropped<br />

its mandatory system, but all companies operating in Belgium still apply the former uniform<br />

system (with minor modifications for the policyholders occupying the lowest levels in the<br />

scale). In other European countries, however, insurers compete on the basis <strong>of</strong> bonus-malus<br />

systems. This is the case for instance in Spain and Portugal.<br />

However, the mandatory French system is still in force. Quite surprisingly, the European<br />

Court <strong>of</strong> Justice decided in 2004 that both the French and Grand Duchy <strong>of</strong> Luxembourg<br />

mandatory bonus-malus systems were not contrary to the rating freedom imposed by the<br />

European legislation. The French law thus still imposes on the insurers operating in France<br />

a unique bonus-malus system. That bonus-malus system is not based on a scale. Instead the


Preface<br />

xxi<br />

French bonus-malus system uses the concept <strong>of</strong> an increase-decrease coefficient (coefficient<br />

de réduction-majoration in French). More precisely, the French bonus-malus system implies<br />

a malus <strong>of</strong> 25 % per claim and a bonus <strong>of</strong> 5 % per claim-free year. So each policyholder<br />

is assigned a base premium and this base premium is adapted according to the number <strong>of</strong><br />

claims reported to the insurer, multiplying the premium by 1.25 each time an accident at fault<br />

is reported to the company, and by 0.95 per claim-free year. The French-type bonus-malus<br />

systems will be studied in Chapter 9.<br />

<strong>Actuarial</strong> and Economic Justifications for Bonus-Malus Systems<br />

Bonus-malus systems allow premiums to be adapted for hidden individual risk factors<br />

and to increase incentives for road safety, by taking into consideration the past claim<br />

record. This can be justified by asymmetrical information between the insurance company<br />

and the policyholders. Asymmetric information arises in insurance markets when firms<br />

have difficulties in judging the riskiness <strong>of</strong> those who purchase insurance coverage. There<br />

are mainly two aspects <strong>of</strong> this phenomenon: adverse selection and moral hazard. Adverse<br />

selection occurs when the policyholders have a better knowledge <strong>of</strong> their claim behaviour than<br />

the insurer does. Policyholders take advantage <strong>of</strong> information about their driving patterns,<br />

known to them but unknown to the insurer. In the context <strong>of</strong> compulsory motor third party<br />

liability insurance, adverse selection is not a significant problem compared to moral hazard<br />

when the insurance companies charge similar amounts <strong>of</strong> premium to all policyholders.<br />

Things are more complicated in a deregulated environment with companies using many<br />

risk classification factors. Since very heterogeneous driving behaviours are observed among<br />

policyholders sharing the same a priori variables, adverse selection cannot be avoided. For<br />

all the related coverages, such as comprehensive damages for instance, adverse selection<br />

always plays an important role.<br />

Considering adverse selection in the vein <strong>of</strong> Rotschild and Stiglitz, individuals partly reveal<br />

their underlying risk through the contract they choose, a fact that has to be taken into account<br />

when setting an adequate tariff structure. In the presence <strong>of</strong> unobservable heterogeneity,<br />

riskier agents will choose a more comprehensive coverage and low risk insurance applicants<br />

have an interest in signalling their quality, by selecting high deductibles (excesses) for<br />

instance.<br />

It is interesting to compare economists’ and actuaries’ approaches to experience rating.<br />

In the economic literature, discounts and penalties are introduced mainly to counteract the<br />

inefficiency which arises from moral hazard. In the actuarial literature, the main purpose is<br />

to better assess the individual risk so that everyone will pay, in the long run, a premium<br />

corresponding to his own claim frequency. Actuaries are thus more interested in adverse<br />

selection than moral hazard.<br />

Cost <strong>of</strong> <strong>Claim</strong>s<br />

The vast majority <strong>of</strong> bonus-malus systems in force around the world penalize the number<br />

<strong>of</strong> at-fault accidents reported to the company, and not their amounts. A severe accident<br />

involving bodily injuries is penalized in the same way as a fender-bender. The reason to<br />

base motor risk classification on just claim frequencies is the long delay to access the cost <strong>of</strong>


xxii<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

bodily injury and other severe claims. Not incorporating claim sizes in bonus-malus systems<br />

and a priori risk classification requires an (implicit) assumption <strong>of</strong> independence between<br />

the random variables ‘number <strong>of</strong> claims’ and ‘cost <strong>of</strong> a claim’, as well as the belief that the<br />

latter does not depend on the driver’s characteristics. This means that the actuarial practice<br />

considers that the cost <strong>of</strong> an accident is, for the most part, beyond the control <strong>of</strong> a driver: a<br />

cautious driver reduces the number <strong>of</strong> accidents, but for the most part cannot control the cost<br />

<strong>of</strong> these accidents (which is largely independent <strong>of</strong> the mistake that caused it). This belief<br />

will be challenged in Chapter 5.<br />

The penalty induced by the majority <strong>of</strong> bonus-malus systems being independent <strong>of</strong> the<br />

claim amount, policyholders have to decide whether it is pr<strong>of</strong>itable or not to report small<br />

claims (in order to avoid an increase in premium). Cheap claims are likely to be defrayed<br />

by the policyholders themselves, and not to be reported to the company. This phenomenon<br />

is known as the hunger for bonus and censors claim amounts and claim frequencies. In<br />

Chapter 5, a statistical model is specified, that takes into account the fact that only ‘expensive’<br />

claims are reported to the insurance company. Retention limits for the policyholders are<br />

determined using the Lemaire algorithm.<br />

In a few bonus-malus systems, however, reporting a ‘severe’ claim (typically, a claim<br />

with bodily injuries) entails a more severe penalty than reporting a ‘minor’ claim (typically,<br />

a claim with material damage only). In the system in force in Japan before 1993, claims<br />

involving bodily injuries were penalized by four levels, while claims with property damage<br />

only were penalized by only two levels. Bonus-malus systems using different types <strong>of</strong> events<br />

to update premium amount will be examined in Chapter 6.<br />

In Chapter 7, we examine an innovative system using variable deductibles rather than<br />

premium relativities. It differs from the systems studied in preceding chapters in that<br />

it mixes elements <strong>of</strong> both a conventional bonus-malus system and a set <strong>of</strong> deductibles<br />

depending on the level occupied in the bonus-malus scale. The first system is a conventional<br />

discount system with loss <strong>of</strong> discount in the case where a claim at fault is reported.<br />

The second system also has a variable discount scale, which can increase with claim-free<br />

experience. However, there is no stepback <strong>of</strong> the discount on claim, only a stepback <strong>of</strong> the<br />

deductible.<br />

Aims <strong>of</strong> This Book<br />

About ten years after the seminal book ‘Bonus-Malus Systems in Automobile Insurance’<br />

by Pr<strong>of</strong>essor Jean Lemaire, we aim to <strong>of</strong>fer a comprehensive treatment <strong>of</strong> the various<br />

experience rating systems applicable to automobile insurance and their relationships with<br />

risk classification.<br />

We hope that the present book will be useful for students in actuarial science, actuaries<br />

(both practitioners and in academia) and more generally for all the persons involved in<br />

technical problems inside insurance companies or consulting firms. For the first time, systems<br />

taking into account the exogeneous information are presented in an actuarial textbook.<br />

Many numerical illustrations carried out with advanced statistical s<strong>of</strong>twares allow for a deep<br />

understanding <strong>of</strong> the concepts.<br />

The present book is the result <strong>of</strong> a close and fruitful collaboration between the Institute <strong>of</strong><br />

<strong>Actuarial</strong> Science <strong>of</strong> the Université Catholique de Louvain, Louvain-la-Neuve, Belgium, its<br />

spin-<strong>of</strong>f consulting firm Reacfin SA and the reinsurance company Secura, based in Brussels.


Preface<br />

xxiii<br />

This collaboration brings together academic expertise and practical experience to provide<br />

efficient solutions to motor ratemaking.<br />

S<strong>of</strong>tware<br />

The numerical illustrations presented in this book use SAS R (standing for Statistical Analysis<br />

System), a powerful s<strong>of</strong>tware package for the manipulation and statistical analysis <strong>of</strong> data.<br />

SAS R is widely used in the insurance industry and practicing actuaries should be familiar<br />

with it. Among the large range <strong>of</strong> modules that can be added to the basic system (known<br />

as SAS R /BASE), we concentrate on the SAS R /STAT module. When no built-in procedures<br />

were available, we have coded programs in the SAS R /IML environment.<br />

The computations <strong>of</strong> bonus-malus scales are performed with the s<strong>of</strong>tware BM-Builder<br />

developed by Reacfin. This is a computer solution running on SAS R that enables creation<br />

<strong>of</strong> a new bonus-malus scale by choosing the number <strong>of</strong> levels, the transition rules, etc. This<br />

scale, tailored to the insurer’s portfolio, is financially balanced.<br />

Here and there, comments about s<strong>of</strong>tware available to perform the analyses detailed in this<br />

book will be provided to help the readers interested in practical implementation. Appropriate<br />

references to the websites <strong>of</strong> the providers are given for further information.<br />

Acknowledgements<br />

The present text originated from a series <strong>of</strong> lectures by Michel Denuit and Jean-François<br />

Walhin to Masters students in actuarial science in different universities (including UCL,<br />

Louvain-la-Neuve, Belgium; UCBL, Lyon, France; ULP, Strasbourg, France; and INSEA,<br />

Rabat, Morocco). Both Michel Denuit and Jean-François Walhin would like to thank the<br />

students, who have worked through the nonlife ratemaking courses over the past years and<br />

supplied invaluable reactions and comments.<br />

Training sessions with insurance pr<strong>of</strong>essionals provided practical insights in the contents<br />

<strong>of</strong> the lectures. The feedback we received from short course audiences in Bucharest, Niort,<br />

Paris, and Warsaw, helped to improve the presentation <strong>of</strong> the topic.<br />

The authors’ own research in this area has benefited at various stages from discussions<br />

or collaborations with esteemed colleagues, including Jean-Philippe Boucher, Arthur<br />

Charpentier, Christophe Crochet, Jan Dhaene, Montserrat Guillén, Philippe Lambert, Stefan<br />

Lang, José Paris, Christian Partrat, Jean Pinquet, and Richard Verrall.<br />

We gratefully acknowledge the financial support <strong>of</strong> the Communauté française de<br />

Belgique under contract ‘Projet d’Actions de Recherche Concertées’ ARC 04/09-320, <strong>of</strong> the<br />

Région Wallonne under project First Spin-<strong>of</strong>f ‘ActuR&D # 315481’, <strong>of</strong> Secura, the Belgian<br />

reinsurance company, and <strong>of</strong> the Banque Nationale de Belgique under grant ‘<strong>Risk</strong> measures<br />

and Economic capital’.<br />

We would like to express our deepest gratitude to Pr<strong>of</strong>essor Ragnar Norberg for kindly<br />

accepting to preface this book, as well as for his careful reading <strong>of</strong> a previous version <strong>of</strong><br />

this manuscript and for the numerous resulting comments. Any errors or omission, however,<br />

remain the responsibility <strong>of</strong> the authors. Pr<strong>of</strong>essor Norberg’s pioneering works are among<br />

the most influential contributions to credibility theory and bonus-malus systems. It is a real<br />

honour that, thirty years after his seminal work appeared in the Scandinavian <strong>Actuarial</strong>


xxiv<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Journal which integrates bonus-malus scales in the framework <strong>of</strong> Markov chains, Pr<strong>of</strong>essor<br />

Norberg introduces the present work.<br />

As always, it has been a pleasure to work with Wendy Hunter and Susan Barclay, Project<br />

Editors; Simon Lightfoot, Publishing Assistant; Kathryn Sharples, Commissioning Editor in<br />

Statistics and Mathematics; and Kelly Board and Sarah Kiddle, Content Editors, Engineering<br />

and Statistics at John Wiley & Sons, Ltd and Sunita Jayachandran at Integra S<strong>of</strong>tware<br />

Services Pvt. Ltd.<br />

Last but not least, we apologize to our families for the time not spent with them during<br />

the preparation <strong>of</strong> this book, and we are very grateful for their understanding.<br />

Michel Denuit<br />

Xavier Maréchal<br />

Sandra Pitrebois<br />

Jean-François Walhin<br />

Louvain-la-Neuve and Brussels, January 2007.


Notation<br />

Here are a few words on the notation and terminology used throughout the book. For the<br />

most part, the notation used in this book conforms to what is usual in mathematical statistics<br />

as well as nonlife insurance mathematics.<br />

The real line − + is denoted as . The half positive real line is + = 0 +.<br />

The set <strong>of</strong> the nonnegative integers is = 0 1 2. The real n-dimensional space is<br />

denoted as n , and n is the set <strong>of</strong> all the n-tuples <strong>of</strong> nonnegative integers. A point <strong>of</strong> n<br />

is an n-dimensional vector with real coordinates. It is represented by a bold letter x; the<br />

ith component <strong>of</strong> x is x i , i = 1 2n. All the vectors are tacitly assumed to be column<br />

vectors, that is,<br />

⎛ ⎞<br />

x 1<br />

⎜<br />

x = ⎝<br />

⎟<br />

⎠ <br />

x n<br />

A superscript ‘T’ is used to indicate transposition. Hence, x T is a row vector with components<br />

x 1 x n . The dimension <strong>of</strong> x is denoted as dimx.<br />

The matrices are denoted by a capital letter in boldface, for instance M; M T denotes<br />

the transposition <strong>of</strong> M. The vector <strong>of</strong> ones, that is 1 11 T , will be denoted by e.<br />

The identity matrix (with entries 1 on the main diagonal and 0 elsewhere) is denoted by I.<br />

The determinant <strong>of</strong> the matrix M is denoted as detM, its inverse by M −1 .<br />

We denote as oh a function <strong>of</strong> h that tends to 0 faster than the identity, that is, such that<br />

oh<br />

lim<br />

h↘0 h<br />

= 0<br />

Intuitively, oh is negligible when h becomes sufficiently small.<br />

The factorial <strong>of</strong> the positive integer n is denoted as n! and defined by n!=nn − 11.<br />

By convention, 0!=1. The binomial coefficient ( n<br />

k)<br />

denotes the number <strong>of</strong> different possible<br />

combinations <strong>of</strong> k items from n different items:


xxvi<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

( n<br />

k)<br />

=<br />

( )<br />

n! n<br />

k!n − k! = <br />

n − k<br />

Note that the binomial coefficient is sometimes denoted as Cn k , especially in the Frenchwritten<br />

mathematical literature, but here we adhere to the more standard notation ( n<br />

k)<br />

. The<br />

Gamma function · is defined as<br />

x =<br />

∫ <br />

0<br />

t x−1 exp−tdt<br />

x > 0<br />

As for any positive integer n, we have n = n−1!, the Gamma function can be considered<br />

as an interpolation <strong>of</strong> the factorials defined for positive integers. Integration by parts shows<br />

that x + 1 = xx for any positive real x. When a and b are positive real numbers, the<br />

definition <strong>of</strong> the binomial coefficient is extended to positive integers as<br />

( a a + 1<br />

=<br />

b)<br />

a − b + 1b + 1 <br />

The incomplete Gamma function · · is defined as<br />

t = 1<br />

t<br />

∫ <br />

0<br />

x t−1 exp−xdx t ≥ 0<br />

A real-valued random variable is denoted by a capital letter, for instance X. The<br />

mathematical expectation operator is denoted as E·. For instance, EX is the expectation<br />

<strong>of</strong> the random variable X. The variance is VX, given by VX = EX 2 − EX 2 .A<br />

random vector is denoted by a bold capital letter, for instance X = X 1 X n T . Matrices<br />

should not be confused with random vectors (the context will make this clear). The variancecovariance<br />

matrix <strong>of</strong> X has covariances CX i X j = EX i X j − EX i EX j outside the<br />

main diagonal (that is, for i ≠ j) and the variances VX i along the main diagonal.<br />

The probability distributions used in this book are summarized next:<br />

• the Bernoulli distribution with parameter 0


Notation<br />

xxvii<br />

• the Negative Binomial distribution with parameters a>0 and >0, denoted as<br />

ina , has probability mass function<br />

( a + k − 1<br />

pk =<br />

k<br />

)( a<br />

a + <br />

) a ( ) k<br />

k∈ <br />

a + <br />

• the Normal distribution with parameters ∈ and 2 > 0, denoted as or 2 , has<br />

probability density function<br />

fx = 1 (<br />

√ 2 exp − 1 )<br />

2 x − 2 2 x∈ <br />

• the LogNormal distribution with parameters ∈ and 2 > 0, denoted as N or 2 ,<br />

has probability density function<br />

(<br />

1<br />

fx =<br />

x √ 2 exp − 1<br />

)<br />

2 lnx − 2 2 x∈ + <br />

• the Negative Exponential distribution with parameter >0, denoted as xp, has<br />

probability density function<br />

fx = exp−x x ∈ + <br />

• the Gamma distribution with parameters >0 and >0, denoted as am , has<br />

probability density function<br />

fx = x−1 exp−x<br />

x∈ + <br />

<br />

• the Inverse Gaussian distribution, with parameters >0 and >0, denoted as<br />

au , has probability density function<br />

(<br />

<br />

fx = √ exp − 1 )<br />

2x<br />

3 2x x − 2 x∈ + <br />

• the Pareto distribution, with parameters >0 and >0, denoted as ar , has<br />

probability density function<br />

fx =<br />

<br />

x + +1 x∈ + <br />

• the Uniform distribution, with parameters a


Part I<br />

<strong>Modelling</strong> <strong>Claim</strong><br />

<strong>Counts</strong><br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


1<br />

Mixed Poisson Models for <strong>Claim</strong><br />

Numbers<br />

1.1 Introduction<br />

1.1.1 Poisson <strong>Modelling</strong> for the Number <strong>of</strong> <strong>Claim</strong>s<br />

In view <strong>of</strong> the economic importance <strong>of</strong> motor third party liability insurance in industrialized<br />

countries, many attempts have been made in the actuarial literature to find a probabilistic<br />

model for the distribution <strong>of</strong> the number <strong>of</strong> claims reported by insured drivers. This chapter<br />

aims to introduce the basic probability models for count data that will be applied in motor<br />

insurance. References to alternative models are gathered in the closing section to this chapter.<br />

The Binomial distribution is the discrete probability distribution <strong>of</strong> the number <strong>of</strong> successes<br />

in a sequence <strong>of</strong> n independent yes/no experiments, each <strong>of</strong> which yields success with<br />

probability q. Such a success/failure experiment is also called a Bernoulli experiment or<br />

Bernoulli trial. Two important distributions arise as approximations <strong>of</strong> Binomial distributions.<br />

If n is large enough and the skewness <strong>of</strong> the distribution is not too great (that is, q is not<br />

too close to 0 or 1), then the Binomial distribution is well approximated by the Normal<br />

distribution. When the number <strong>of</strong> observations n is large, and the success probability q<br />

is small, the corresponding Binomial distribution is well approximated by the Poisson<br />

distribution with mean = nq. The Poisson distribution is thus sometimes called the law<br />

<strong>of</strong> small numbers because it is the probability distribution <strong>of</strong> the number <strong>of</strong> occurrences <strong>of</strong><br />

an event that happens rarely but has very many opportunities to happen. The parallel with<br />

traffic accidents is obvious.<br />

The Poisson distribution was discovered by Siméon-Denis Poisson (1781–1840) and<br />

published in 1838 in his work entitled Recherches sur la Probabilité des Jugements en<br />

Matières Criminelles et Matière Civile (which could be translated as ‘Research on the<br />

Probability <strong>of</strong> Judgments in Criminal and Civil Matters’). Typically, a Poisson random<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


4 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

variable is a count <strong>of</strong> the number <strong>of</strong> events that occur in a certain time interval or spatial<br />

area. For example, the number <strong>of</strong> cars passing a fixed point in a five-minute interval, or the<br />

number <strong>of</strong> claims reported to an insurance company by an insured driver in a given period.<br />

A typical characteristic associated with the Poisson distribution is certainly equidispersion:<br />

the variance <strong>of</strong> the Poisson distribution is equal to its mean.<br />

1.1.2 Heterogeneity and Mixed Poisson Model<br />

The Poisson distribution plays a prominent role in modelling discrete count data, mainly<br />

because <strong>of</strong> its descriptive adequacy as a model when only randomness is present and the<br />

underlying population is homogeneous. Unfortunately, this is not a realistic assumption<br />

to make in modelling many real insurance data sets. Poisson mixtures are well-known<br />

counterparts to the simple Poisson distribution for the description <strong>of</strong> inhomogeneous<br />

populations. Of special interest are populations consisting <strong>of</strong> a finite number <strong>of</strong> homogeneous<br />

sub-populations. In these cases the probability distribution <strong>of</strong> the population can be regarded<br />

as a finite mixture <strong>of</strong> Poisson distributions.<br />

The problem <strong>of</strong> unobserved heterogeneity arises because differences in driving behaviour<br />

among individuals cannot be observed by the actuary. One <strong>of</strong> the well-known consequences<br />

<strong>of</strong> unobserved heterogeneity in count data analysis is overdispersion: the variance <strong>of</strong> the<br />

count variable is larger than the mean. Apart from its implications for the low-order<br />

moment structure <strong>of</strong> the counts, unobserved heterogeneity has important implications for the<br />

probability structure <strong>of</strong> the ensuing mixture model. The phenomena <strong>of</strong> excesses <strong>of</strong> zeros as<br />

well as heavy upper tails in most insurance data can be seen as an implication <strong>of</strong> unobserved<br />

heterogeneity (Shaked’s Two Crossings Theorem will make this clear). It is customary to<br />

allow for unobserved heterogeneity by superposing a random variable (called a random<br />

effect) on the mean parameter <strong>of</strong> the Poisson distribution, yielding a mixed Poisson model.<br />

In a mixed Poisson process, the annual expected claim frequency itself becomes random.<br />

1.1.3 Maximum Likelihood Estimation<br />

All the models implemented in this book are parametric, in the sense that the probabilities are<br />

known functions depending on a finite number <strong>of</strong> (real-valued) parameters. The Binomial,<br />

Poisson and Normal models are examples <strong>of</strong> parametric distributions. The first step in the<br />

analysis is to select a reasonable parametric model for the observations, and then to estimate<br />

the underlying parameters. The maximum likelihood estimator is the value <strong>of</strong> the parameter<br />

(or parameter vector) that makes the observed data most likely to have occurred given the<br />

data generating process assumed to have produced the observations. All we need to derive the<br />

maximum likelihood estimator is to formulate statistical models in the form <strong>of</strong> a likelihood<br />

function as a probability <strong>of</strong> getting the data at hand. The larger the likelihood, the better the<br />

model.<br />

Maximum likelihood estimates have several desirable asymptotic properties: consistency,<br />

efficiency, asymptotic Normality, and invariance. The advantages <strong>of</strong> maximum likelihood<br />

estimation are that it fully uses all the information about the parameters contained in the data<br />

and that it is highly flexible. Most applied maximum likelihood problems lack closed-form<br />

solutions and so rely on numerical maximization <strong>of</strong> the likelihood function. The advent <strong>of</strong><br />

fast computers has made this a minor issue in most cases. Hypothesis testing for maximum


Mixed Poisson Models for <strong>Claim</strong> Numbers 5<br />

likelihood parameter estimates is straightforward due to the asymptotic Normal distribution<br />

<strong>of</strong> maximum likelihood estimates and the Wald and likelihood ratio tests.<br />

1.1.4 Agenda<br />

Section 1.2 briefly reviews the basic probability concepts used throughout this chapter (and<br />

the entire book), for further reference. Notions including probability spaces, random variables<br />

and probability distributions are made precise in this introductory section.<br />

In Section 1.3, we recall the main probabilistic tools to work with discrete distributions:<br />

probability mass function, distribution function, probability generating function, etc. Then,<br />

we review some basic counting distributions, including the Binomial and Poisson laws.<br />

Section 1.4 is devoted to mixture models to account for unobserved heterogeneity. Mixed<br />

Poisson distributions are discussed, including Negative Binomial (or Poisson-Gamma),<br />

Poisson-Inverse Gaussian and Poisson-LogNormal models.<br />

Section 1.5 presents the maximum likelihood estimation method. Large sample properties<br />

<strong>of</strong> the maximum likelihood estimators are discussed, and testing procedures are described.<br />

The large sample properties are particularly appealing to actuaries who usually deal with<br />

tens <strong>of</strong> thousands <strong>of</strong> observations in insurance portfolios.<br />

Section 1.6 gives numerical illustrations on the basis <strong>of</strong> a Belgian motor third party liability<br />

insurance portfolio. The observed claim frequency distribution is fitted using the Poisson<br />

distribution and various mixed Poisson probability distributions, and the optimal model is<br />

selected on the basis <strong>of</strong> appropriate goodness-<strong>of</strong>-fit tests.<br />

The final Section, 1.7, concludes Chapter 1 by providing suggestions for further reading<br />

and bibliographic notes about the models proposed in the actuarial literature for the annual<br />

number <strong>of</strong> claims.<br />

1.2 Probabilistic Tools<br />

1.2.1 Experiment and Universe<br />

Many everyday statements for actuaries take the form ‘the probability <strong>of</strong> A is p’, where A<br />

is some event (such as ‘the total losses exceed the threshold E 1 000 000’ or ‘the number <strong>of</strong><br />

claims reported by a given policyholder is less than two’) and p is a real number between zero<br />

and one. The occurrence or nonoccurrence <strong>of</strong> A depends upon the chain <strong>of</strong> circumstances<br />

under consideration. Such a particular chain is called an experiment in probability; the result<br />

<strong>of</strong> an experiment is called its outcome and the set <strong>of</strong> all outcomes (called the universe) is<br />

denoted by .<br />

The word ‘experiment’ is used here in a very general sense to describe virtually any<br />

process for which all possible outcomes can be specified in advance and for which the actual<br />

outcome will be one <strong>of</strong> those specified. The basic feature <strong>of</strong> an experiment is that its outcome<br />

is not definitely known by the actuary beforehand.<br />

1.2.2 Random Events<br />

Random events are subsets <strong>of</strong> the universe associated with a given experiment. A random<br />

event is the mathematical formalization <strong>of</strong> an event described in words. It is random since


6 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

we cannot predict with certainty whether it will be realized or not during the experiment.<br />

For instance, if we are interested in the number <strong>of</strong> claims incurred by a policyholder <strong>of</strong><br />

an automobile portfolio during one year, the experiment consists in observing the driving<br />

behaviour <strong>of</strong> this individual during an annual period, and the universe is simply the set<br />

0 1 2<strong>of</strong> the nonnegative integers. The random event A = ‘the policyholder reports<br />

at most one claim’ is identified with the subset 0 1 ⊂ .<br />

As usual, we use A∪B and A∩B to represent the union and the intersection, respectively,<br />

<strong>of</strong> any two subsets A and B <strong>of</strong> . The union <strong>of</strong> sets is defined to be the set that contains<br />

the points that belong to at least one <strong>of</strong> the sets. The intersection <strong>of</strong> sets is defined to be the<br />

set that contains the points that are common to all the sets. These set operations correspond<br />

to the ‘or’ and ‘and’ between sentences: A ∪ B is the event which is realized if A or B is<br />

realized and A ∩ B is the event realized if A and B are simultaneously realized during the<br />

experiment. We also define the difference between sets A and B, denoted as A B, asthe<br />

set <strong>of</strong> elements in A but not in B. Finally, A is the complementary event <strong>of</strong> A, defined as<br />

A; it is the set <strong>of</strong> points <strong>of</strong> that do not belong to A. This corresponds to the negation:<br />

A is realized if A is not realized during the experiment. In particular, =∅, where ∅ is the<br />

empty set.<br />

1.2.3 Sigma-Algebra<br />

One needs to specify a family <strong>of</strong> events to which probabilities can be ascribed in a<br />

consistent manner. The family has to be closed under standard operations on sets; indeed,<br />

given two events A and B in , we want A∪B, A∩B and A to still be events (i.e. still belong<br />

to ). Technically speaking, this will be the case if is a sigma-algebra. Recall that a family<br />

<strong>of</strong> subsets <strong>of</strong> the universe is called a sigma-algebra if it fulfills the three following<br />

properties: (i) ∈ , (ii) A ∈ ⇒ A ∈ , and (iii) A 1 A 2 A 3 ∈ ⇒ ⋃ i≥1 A i ∈ .<br />

The three properties (i)-(iii) are very natural. Indeed, (i) means that itself is an event<br />

(it is the event which is always realized). Property (ii) means that if A is an event, the<br />

complement <strong>of</strong> A is also an event. Finally, property (iii) means that the event consisting in<br />

the realization <strong>of</strong> at least one <strong>of</strong> the A i s is also an event.<br />

1.2.4 Probability Measure<br />

Once the universe has been equipped with a sigma-algebra <strong>of</strong> random events, a<br />

probability measure Pr can be defined on . The knowledge <strong>of</strong> Pr allows us to discuss the<br />

likelihood <strong>of</strong> the occurrence <strong>of</strong> events in . To be specific, Pr assigns to each random event<br />

A its probability PrA; PrA is the likelihood <strong>of</strong> realization <strong>of</strong> A. Formally, a probability<br />

measure Pr maps to 0 1, with Pr = 1, and is such that given A 1 A 2 A 3 ∈ <br />

which are pairwise disjoint, i.e., such that A i ∩ A j =∅if i ≠ j,<br />

[ ]<br />

⋃<br />

Pr A i = ∑ PrA i <br />

i≥1 i≥1<br />

this technical property is usually referred to as the sigma-additivity <strong>of</strong> Pr.<br />

The properties assigned to Pr naturally follow from empirical evidence: if we were allowed<br />

to repeat an experiment a large number <strong>of</strong> times, keeping the initial conditions as equal


Mixed Poisson Models for <strong>Claim</strong> Numbers 7<br />

as possible, the proportion <strong>of</strong> times that an event A occurs would behave according to the<br />

definition <strong>of</strong> Pr. Note that PrA is then the mathematical idealization <strong>of</strong> the proportion <strong>of</strong><br />

times A occurs.<br />

1.2.5 Independent Events<br />

Independence is a crucial concept in probability theory. It aims to formalize the intuitive<br />

notion <strong>of</strong> ‘not influencing each other’ for random events: we would like to give a precise<br />

meaning to the fact that the realization <strong>of</strong> an event does not decrease nor increase the<br />

probability that the other event occurs. Formally, two events A and B are said to be<br />

independent if the probability <strong>of</strong> their intersection equals the product <strong>of</strong> their respective<br />

probabilities, that is, if PrA ∩ B = PrAPrB.<br />

This definition is extended to more than two events as follows. The events in a family <br />

<strong>of</strong> events are independent if for every finite sequence A 1 A 2 A k <strong>of</strong> events in ,<br />

[ ]<br />

⋂ k k∏<br />

Pr A i = PrA i (1.1)<br />

i=1<br />

The concept <strong>of</strong> independence is very important in assigning probabilities to events. For<br />

instance, if two or more events are regarded as being physically independent, in the sense<br />

that the occurrence or nonoccurrence <strong>of</strong> some <strong>of</strong> them has no influence on the occurrence<br />

or nonoccurrence <strong>of</strong> the others, then this condition is translated into mathematical terms<br />

through the assignment <strong>of</strong> probabilities satisfying Equation (1.1).<br />

i=1<br />

1.2.6 Conditional Probability<br />

Independence is the exception rather than the rule. In any given experiment, it is <strong>of</strong>ten<br />

necessary to consider the probability <strong>of</strong> an event A when additional information about the<br />

outcome <strong>of</strong> the experiment has been obtained from the occurrence <strong>of</strong> some other event B.<br />

This corresponds to intuitive statements <strong>of</strong> the form ‘if B occurs then the probability <strong>of</strong> A is<br />

p’, where B can be ‘March is rainy’ and A ‘the claim frequency in motor insurance increases<br />

by 5 %’. This is called the conditional probability <strong>of</strong> A given B, and is formally defined as<br />

follows. If PrB > 0 then the conditional probability PrAB <strong>of</strong> A given B is defined to be<br />

PrAB =<br />

PrA ∩ B<br />

(1.2)<br />

PrB<br />

The definition <strong>of</strong> conditional probabilities through (1.2) is in line with empirical evidence.<br />

Repeating a given experiment a large number <strong>of</strong> times, PrAB is the mathematical<br />

idealization <strong>of</strong> the proportion <strong>of</strong> times A occurs in those experiments where B did occur,<br />

hence the ratio (1.2).<br />

It is easily seen that A and B are independent if, and only if,<br />

PrAB = PrAB = PrA (1.3)


8 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Note that this interpretation <strong>of</strong> independence is much more intuitive than the definition given<br />

above: indeed the identity expresses the natural idea that the realization or not <strong>of</strong> B does not<br />

increase nor decrease the probability that A occurs.<br />

1.2.7 Random Variables and Random Vectors<br />

Often, actuaries are not interested in an experiment itself but rather in some consequences <strong>of</strong><br />

its random outcome. For instance, they are more concerned with the amounts the insurance<br />

company will have to pay than with the particular circumstances which give rise to the<br />

claims. Such consequences, when real-valued, may be thought <strong>of</strong> as functions mapping <br />

into the real line .<br />

Such functions are called random variables provided they satisfy certain desirable<br />

properties, precisely stated in the following definition: A random variable X is a measurable<br />

function mapping to the real numbers, i.e., X→ is such that X −1 − x ∈ <br />

for any x ∈ , where X −1 − x = ∈ X ≤ x. In other words, the measurability<br />

condition X −1 − x ∈ ensures that the actuary can make statements like ‘X is less<br />

than or equal to x’ and quantify their likelihood. Random variables are mathematical<br />

formalizations <strong>of</strong> random outcomes given by numerical values. An example <strong>of</strong> a random<br />

variable is the amount <strong>of</strong> a claim associated with the occurrence <strong>of</strong> an automobile accident.<br />

A random vector X = X 1 X 2 X n T is a collection <strong>of</strong> n univariate random variables,<br />

X 1 , X 2 , , X n , say, defined on the same probability space Pr. Random vectors are<br />

denoted by bold capital letters.<br />

1.2.8 Distribution Functions<br />

In many cases, neither the universe nor the function X need to be given explicitly.<br />

The practitioner has only to know the probability law governing X or, in other words, its<br />

distribution. This means that he is interested in the probabilities that X takes values in<br />

appropriate subsets <strong>of</strong> the real line (mainly intervals).<br />

To each random variable X is associated a function F X called the distribution function<br />

<strong>of</strong> X, describing the stochastic behaviour <strong>of</strong> X. Of course, F X does not indicate what is the<br />

actual outcome <strong>of</strong> X, but shows how the possible values for X are distributed (hence its<br />

name). More precisely, the distribution function <strong>of</strong> the random variable X, denoted as F X ,is<br />

defined as<br />

F X x = PrX −1 − x ≡ PrX ≤ x x ∈ <br />

In other words, F X x represents the probability that the random variable X assumes a<br />

value that is less than or equal to x. IfX is the total amount <strong>of</strong> claims generated by some<br />

policyholder, F X x is the probability that this policyholder produces a total claim amount<br />

<strong>of</strong> at most E x. The distribution function F X corresponds to an estimated physical probability<br />

distribution or a well-chosen subjective probability distribution.<br />

Any distribution function F has the following properties: (i) F is nondecreasing, i.e.<br />

Fx ≤ Fy if x


Mixed Poisson Models for <strong>Claim</strong> Numbers 9<br />

i.e. lim h↘0 Fx+h = Fx, and (v) Pra


10 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

1.3 Poisson Distribution<br />

1.3.1 Counting Random Variables<br />

A discrete random variable X assumes only a finite (or countable) number <strong>of</strong> values. The<br />

most important subclass <strong>of</strong> nonnegative discrete random variables is the integer case, where<br />

each observation (outcome) is an integer (typically, the number <strong>of</strong> claims reported to the<br />

company). More precisely, a counting random variable N is valued in 0 1 2. Its<br />

stochastic behaviour is characterized by the set <strong>of</strong> probabilities p k k= 0 1assigned<br />

to the nonnegative integers, where p k = PrN = k. The (discrete) distribution <strong>of</strong> N associates<br />

with each possible integer value k = 0 1 2 the probability p k that it will be the observed<br />

value. The distribution must satisfy the two conditions:<br />

p k ≥ 0 for all k and<br />

+∑<br />

k=0<br />

p k = 1<br />

i.e. the probabilities are all nonnegative real numbers lying between zero (impossibility) and<br />

unity (certainty), and their sum must be unity because it is certain that one or other <strong>of</strong> the<br />

values will be observed.<br />

1.3.2 Probability Mass Function<br />

In discrete distribution theory the p k s are regarded as values <strong>of</strong> a mathematical function, i.e.<br />

p k = pk k = 0 1 2 (1.4)<br />

where p· is a known function depending on a set <strong>of</strong> parameters . The function p·<br />

defined in (1.4) is usually called the probability mass function. Different functional forms<br />

lead to different discrete distributions. This is a parametric model.<br />

The distribution function F N → 0 1 <strong>of</strong> N gives for any real threshold x, the probability<br />

for N to be smaller than or equal to x. The distribution function F N <strong>of</strong> N is related to the<br />

probability mass function via<br />

x<br />

∑<br />

F N x = p k x∈ + <br />

k=0<br />

where p k is given by Expression (1.4) and where x denotes the largest integer n such that<br />

n ≤ x (it is thus the integer part <strong>of</strong> x). Considering (1.4), F N also depends on .<br />

1.3.3 Moments<br />

There are various useful and important quantities associated with a probability distribution.<br />

They may be used to summarize features <strong>of</strong> the distribution. The most familiar and widely<br />

used are the moments, particularly the mean<br />

EN =<br />

+∑<br />

k=0<br />

kp k


Mixed Poisson Models for <strong>Claim</strong> Numbers 11<br />

which is given by the sum <strong>of</strong> the products <strong>of</strong> all the possible outcomes multiplied by their<br />

probability, and the variance<br />

VN = EN − EN 2 =<br />

+∑<br />

k=0<br />

(<br />

k − EN<br />

) 2pk<br />

<br />

which is given by the sum <strong>of</strong> the products <strong>of</strong> the squared differences between all the<br />

outcomes and the mean, multiplied by their probability. Expanding the squared difference<br />

in the definition <strong>of</strong> the variance, it is easily seen that the variance can be reformulated as<br />

VN = E [ N 2 − 2N EN + EN 2] = EN 2 − EN 2 <br />

which provides a convenient way to compute the variance as the difference between the<br />

second moment EN 2 and the square EN 2 <strong>of</strong> the first moment EN. The mean and the<br />

variance are commonly denoted as and 2 , respectively. Considering (1.4), both EN and<br />

VN are functions <strong>of</strong> , that is,<br />

EN = and VN = 2 <br />

The mean is used as a measure <strong>of</strong> the location <strong>of</strong> the distribution: it is an average <strong>of</strong><br />

the possible outcomes 0 1weighted by the corresponding probabilities p 0 p 1 The<br />

variance is widely used as a measure <strong>of</strong> the spread <strong>of</strong> the distribution: it is a weighted average<br />

<strong>of</strong> the squared distances between the outcomes 0 1and the expected value EN. Recall<br />

that E· is a linear operator. From the properties <strong>of</strong> E·, it is easily seen that the variance<br />

V· is shift-invariant and additive for independent random variables.<br />

The degree <strong>of</strong> asymmetry <strong>of</strong> the distribution <strong>of</strong> a random variable N is measured by its<br />

skewness, denoted as N. The skewness is the third central moment <strong>of</strong> N , normalized by<br />

its variance raised to the power 3/2 (in order to get a number without unit). Precisely, the<br />

skewness <strong>of</strong> N is given by<br />

N = EN − EN3 <br />

<br />

VN 3/2<br />

For any random variable N with a symmetric distribution the skewness N is zero.<br />

Positively skewed distributions tend to concentrate most <strong>of</strong> the probability mass on small<br />

values, but the remaining probability is stretched over a long range <strong>of</strong> larger values.<br />

There are other related sets <strong>of</strong> constants, such as the cumulants, the factorial moments,<br />

the factorial cumulants, etc., which may be more convenient to use in some circumstances.<br />

For details about these constants, we refer the reader, e.g., to Johnson ET AL. (1992).<br />

1.3.4 Probability Generating Function<br />

In principle all the theoretical properties <strong>of</strong> the distribution can be derived from the probability<br />

mass function. There are, however, several other functions from which exactly the same<br />

information can be derived. This is because the functions are all one-to-one transformations


12 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

<strong>of</strong> each other, so each characterizes the distribution. One particularly useful function is the<br />

probability generating function, which is defined as<br />

N z = Ez N =<br />

+∑<br />

k=0<br />

p k z k 0


Mixed Poisson Models for <strong>Claim</strong> Numbers 13<br />

The probability generating function <strong>of</strong> N 1 + N 2 is easily obtained from<br />

N1 +N 2<br />

z = Ez N 1+N 2<br />

= Ez N 1<br />

Ez N 2<br />

= N1<br />

z N2<br />

z<br />

since the mutual independence <strong>of</strong> N 1 and N 2 ensures that<br />

Ez N 1+N 2<br />

=<br />

=<br />

+∑<br />

+∑<br />

k 1 =0 k 2 =0<br />

+∑<br />

k 1 =0<br />

z k 1+k 2<br />

PrN 1 = k 1 PrN 2 = k 2 <br />

z k 1<br />

PrN 1 = k 1 <br />

= Ez N 1<br />

Ez N 2<br />

<br />

+∑<br />

k 2 =0<br />

z k 2<br />

PrN 2 = k 2 <br />

Summing random variables thus corresponds to a convolution product for probability mass<br />

functions and to regular products for probability generating functions. An expansion <strong>of</strong><br />

N1<br />

N2<br />

· as a series in powers <strong>of</strong> z then gives the probability mass function <strong>of</strong> N 1 + N 2 ,<br />

usually in a much easier way than computing the convolution product <strong>of</strong> the probability<br />

mass functions <strong>of</strong> N 1 and N 2 .<br />

1.3.6 From the Binomial to the Poisson Distribution<br />

Bernoulli Distribution<br />

The Bernoulli distribution is an extremely simple and basic distribution. It arises from what is<br />

known as a Bernoulli trial: a single observation is taken where the outcome is dichotomous,<br />

e.g., success or failure, alive or dead, male or female, 0 or 1. The probability <strong>of</strong> success is<br />

q. The probability <strong>of</strong> failure is 1 − q.<br />

If N is Bernoulli distributed with success probability q, which is denoted as N ∼ erq,<br />

we have<br />

⎧<br />

⎪⎨ 1 − q if k = 0<br />

pkq = q if k = 1<br />

⎪⎩<br />

0 otherwise.<br />

There is thus just one parameter: the success probability q. The mean is<br />

and the variance is<br />

The probability generating function is<br />

EN = 0 × 1 − q + 1 × q = q (1.6)<br />

VN = EN 2 − q 2 = q − q 2 = q1 − q (1.7)<br />

N z = 1 − q × z 0 + q × z 1 = 1 − q + qz (1.8)<br />

It is easily seen that N 0 = p0q and <br />

N ′ 0q = p1q, as it should be.


14 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Binomial Distribution<br />

The Binomial distribution describes the outcome <strong>of</strong> a sequence <strong>of</strong> n independent Bernoulli<br />

trials, each with the same probability q <strong>of</strong> success. The probability that success is the outcome<br />

in exactly k <strong>of</strong> the trials is<br />

( n<br />

pkn q = q<br />

k)<br />

k 1 − q n−k k= 0 1n (1.9)<br />

and 0 otherwise. Formula (1.9) defines the Binomial distribution. There are now two<br />

parameters: the number <strong>of</strong> trials n (also called the exponent, or size) and the success<br />

probability q. Henceforth, we write N ∼ inn q to indicate that N is Binomially<br />

distributed, with size n and success probability q.<br />

Moments <strong>of</strong> the Binomial Distribution<br />

The mean <strong>of</strong> N ∼ inn q is<br />

EN =<br />

n∑<br />

k=1<br />

= nq<br />

n!<br />

k − 1!n − k! qk 1 − q n−k<br />

n∑<br />

PrM = k − 1 = nq (1.10)<br />

where M ∼ inn − 1q. Furthermore, with M as defined before,<br />

so that the variance is<br />

EN 2 =<br />

n∑<br />

k=1<br />

= nq<br />

k=1<br />

n!<br />

k − 1!n − k! kqk 1 − q n−k<br />

n∑<br />

k PrM = k − 1<br />

k=1<br />

= nn − 1q 2 + nq<br />

VN = EN 2 − nq 2 = nq1 − q (1.11)<br />

We immediately observe that the Binomial distribution is underdispersed, i.e. its variance is<br />

smaller than its mean : VN = nq1 − q ≤ EN = nq.<br />

Probability Generating Function and Closure under Convolution for the<br />

Binomial Distribution<br />

The probability generating function <strong>of</strong> N ∼ inn q is<br />

( ) n∑ n<br />

N z = qz k 1 − q n−k = 1 − q + qz n (1.12)<br />

k=0<br />

k<br />

Note that Expression (1.12) is the Bernoulli probability generating function (1.8), raised to<br />

the nth power. This was expected since the Binomial random variable N can be seen as the


Mixed Poisson Models for <strong>Claim</strong> Numbers 15<br />

sum <strong>of</strong> n independent Bernoulli random variables with equal success probability q. This also<br />

explains why (1.10) is equal to n times (1.6) and why (1.11) is equal to n times (1.7) (in the<br />

latter case, since the variance is additive for independent random variables).<br />

From (1.12), we also see that having independent random variables N 1 ∼ inn 1 q and<br />

N 2 ∼ inn 2 q, the sum N 1 + N 2 is still Binomially distributed. This comes from the fact<br />

that the probability generating function <strong>of</strong> N 1 + N 2 is<br />

N1 +N 2<br />

z = N1<br />

z N2<br />

z = 1 − q + qz n 1+n 2<br />

so that N 1 + N 2 ∼ inn 1 + n 2 q. Note that this is not the case if the success probabilities<br />

differ.<br />

Limiting Form <strong>of</strong> the Binomial Distribution<br />

When n becomes large, (1.9) may be approximated by a Normal distribution according to<br />

the De Moivre–Laplace theorem. The approximation is much improved when a continuity<br />

correction is applied. The Poisson distribution can be obtained as a limiting case <strong>of</strong> the<br />

Binomial distribution when n tends to infinity together with q becoming very small.<br />

Specifically, let us assume that N n ∼ inn /n and let n tend to +. The probability<br />

mass at 0 then becomes<br />

(<br />

PrN n = 0 = 1 − ) n<br />

→ exp− as n →+<br />

n<br />

To get the probability masses on the positive integers, let us compute the ratio<br />

from which we conclude<br />

PrN n = k + 1<br />

PrN n = k<br />

=<br />

n−k <br />

k+1 n<br />

1 − n<br />

→<br />

as n →+<br />

k + 1<br />

lim PrN n = k = exp− k<br />

k= 0 1 2<br />

n→+ k!<br />

Poisson Distribution<br />

The Poisson random variable takes its values in 0 1and has probability mass function<br />

pk = exp− k<br />

k= 0 1 (1.13)<br />

k!<br />

Having a counting random variable N , we denote as N ∼ oi the fact that N is Poisson<br />

distributed with parameter . The Poisson distribution occupies a central position in discrete<br />

distribution theory analogous to that occupied by the Normal distribution in continuous<br />

distribution theory. It also has many practical applications.<br />

The Poisson distribution describes events that occur randomly and independently in space<br />

or time. A classic example in physics is the number <strong>of</strong> radioactive particles recorded by<br />

a Geiger counter in a fixed time interval. This property <strong>of</strong> the Poisson distribution means<br />

that it can act as a reference standard when deviations from pure randomness are suspected.<br />

Although the Poisson distribution is <strong>of</strong>ten called the law <strong>of</strong> small numbers, there is no need


16 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

for = nq to be small. It is the largeness <strong>of</strong> n and the smallness <strong>of</strong> q = /n that are<br />

important. However most <strong>of</strong> the data sets analysed in the literature show a small frequency.<br />

This will be the case with motor data sets in insurance applications.<br />

Moments <strong>of</strong> the Poisson Distribution<br />

If N ∼ oi, then its expected value is given by<br />

Moreover,<br />

EN =<br />

+∑<br />

k=1<br />

= exp−<br />

k exp− k<br />

k!<br />

+∑ k+1<br />

k=0<br />

k!<br />

= (1.14)<br />

EN 2 =<br />

+∑<br />

k=1<br />

so that the variance <strong>of</strong> N is equal to<br />

= exp−<br />

k 2 exp− k<br />

k!<br />

+∑<br />

k=0<br />

k + 1 k+1<br />

= + 2 <br />

k!<br />

VN = EN 2 − 2 = (1.15)<br />

Considering Expressions (1.14) and (1.15), we see that both the mean and the variance <strong>of</strong><br />

the Poisson distribution are equal to , a phenomenon termed as equidispersion.<br />

The skewness <strong>of</strong> N ∼ oi is<br />

N = √ 1 (1.16)<br />

<br />

Clearly, N decreases with . For small values <strong>of</strong> the distribution is very skewed<br />

(asymmetric) but as increases it becomes less skewed and is nearly symmetric by = 15.<br />

Probability Generating Function and Closure Under Convolution for the<br />

Poisson Distribution<br />

The probability generating function <strong>of</strong> the Poisson distribution has a very simple form.<br />

Coming back to the Equation (1.5) defining N and replacing the p k s with their expression<br />

(1.13) gives<br />

+∑<br />

N z = exp− zk = exp ( z − 1 ) (1.17)<br />

k=0<br />

k!<br />

This shows that the Poisson distribution is closed under convolution. Having independent<br />

random variables N 1 ∼ oi 1 and N 2 ∼ oi 2 , the probability generating function <strong>of</strong><br />

the sum N 1 + N 2 is<br />

N1 +N 2<br />

z = N1<br />

z N2<br />

z = exp ( 1 z − 1 ) exp ( 2 z − 1 ) = exp ( 1 + 2 z − 1 )<br />

so that N 1 + N 2 ∼ oi 1 + 2 .


Mixed Poisson Models for <strong>Claim</strong> Numbers 17<br />

The sum <strong>of</strong> two independent Poisson distributed random variables is also Poisson<br />

distributed, with parameter equal to the sum <strong>of</strong> the Poisson parameters. This property<br />

obviously extends to any number <strong>of</strong> terms, and the Poisson distribution is said to be closed<br />

under convolution (i.e. the convolution <strong>of</strong> Poisson distributions is still Poisson).<br />

1.3.7 Poisson Process<br />

Definition<br />

Recall that a stochastic process is a collection <strong>of</strong> random variables Nt t ∈ indexed by<br />

a real-valued parameter t taking values in the index set . Usually, represents a set <strong>of</strong><br />

observation times. In this book, we will be interested in continuous-time stochastic processes<br />

where = + .<br />

A stochastic process Nt t ≥ 0 is said to be a counting process if t ↦→ Nt is rightcontinuous<br />

and Nt−Nt− is 0 or 1. Intuitively speaking, Nt represents the total number<br />

<strong>of</strong> events that have occurred up to time t. Such a process enjoys the following properties:<br />

(i) Nt ≥ 0, (ii) Nt is integer valued, (iii) if s0if<br />

(i) the process has stationary increments, that is,<br />

PrNt + − Nt = k = PrNs + − Ns = k<br />

for any integer k, instants s ≤ t and increment >0.<br />

(ii) the process has independent increments, that is, for any integer k>0 and instants<br />

0 ≤ t 0


18 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

claims in a sufficiently small time interval is negligible when compared to the probability<br />

that he reports zero or only one claim.<br />

Link with the Poisson Distribution<br />

The Poisson process is intimately linked to the Poisson distribution, as precisely stated in<br />

the next result.<br />

Property 1.1 For any Poisson process, the number <strong>of</strong> events in any interval <strong>of</strong> length t is<br />

Poisson distributed with mean t, that is, for all s t ≥ 0,<br />

PrNt + s − Ns = n = exp−t tn n= 0 1 2<br />

n!<br />

Pro<strong>of</strong> Without loss <strong>of</strong> generality, we only have to prove that Nt ∼ oit. For any<br />

integer k, let us denote p k t = PrNt = k, t ≥ 0. The announced result for k = 0 comes<br />

from<br />

p 0 t + t = PrNt = 0 and Nt + t − Nt = 0<br />

= PrNt = 0 PrNt + t − Nt = 0<br />

= p 0 tp 0 t<br />

= p 0 t ( 1 − t + ot ) <br />

where the joint probability factors into two terms since the increments <strong>of</strong> a Poisson process<br />

are independent random variables. This gives<br />

p 0 t + t − p 0 t<br />

t<br />

Taking the limit for t ↘ 0 yields<br />

=−p 0 t + ot p<br />

t 0 t<br />

d<br />

dt p 0t =−p 0 t<br />

This differential equation with the initial condition p 0 0 = 1 admits the solution<br />

p 0 t = exp−t (1.18)<br />

which is in fact the oit probability mass function evaluated at the origin.<br />

For k ≥ 1, let us write<br />

p k t + t = PrNt + t = k<br />

= PrNt + t = kNt = k PrNt = k<br />

+ PrNt + t = kNt = k − 1 PrNt = k − 1<br />

k∑<br />

+ PrNt + t = kNt = k − j PrNt = k − j<br />

j=2


Mixed Poisson Models for <strong>Claim</strong> Numbers 19<br />

= PrNt + t − Nt = 0 PrNt = k<br />

+ PrNt + t − Nt = 1 PrNt = k − 1<br />

k∑<br />

+ PrNt + t − Nt = j PrNt = k − j<br />

j=2<br />

Since the increments <strong>of</strong> a Poisson process are independent random variables, we can write<br />

k∑<br />

p k t + t = p 0 tp k t + p 1 tp k−1 t + p j tp k−j t<br />

j=2<br />

= 1 − tp k t + tp k−1 t + ot<br />

This gives<br />

p k t + t − p k t<br />

t<br />

Taking the limit for t ↘ 0 yields as above<br />

= p k−1 t − p k t + ot <br />

t<br />

d<br />

dt p kt = p k−1 t − p k t k ≥ 1 (1.19)<br />

Multiplying by z k each <strong>of</strong> the equation (1.19), and summing over k gives<br />

+∑<br />

k=0<br />

( d<br />

dt p kt)<br />

z k = z<br />

+∑<br />

k=0<br />

p k tz k − <br />

+∑<br />

k=0<br />

p k tz k (1.20)<br />

Denoting as t the probability generating function <strong>of</strong> Nt, equation (1.20) becomes<br />

With the condition 0 z = 1, Equation (1.21) has solution<br />

<br />

t tz = z − 1 t z (1.21)<br />

t z = exptz − 1<br />

where we recognize the oit probability generating function (1.17). This ends the pro<strong>of</strong>.<br />

□<br />

When the hypotheses behind a Poisson process are verified, the number N1 <strong>of</strong> claims<br />

hitting a policy during a period <strong>of</strong> length 1 is Poisson distributed with parameter . So,<br />

a counting process Nt t ≥ 0, starting from N0 = 0, is a Poisson process with rate<br />

>0if


20 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

(i) The process has independent increments<br />

(ii) The number <strong>of</strong> events in any interval <strong>of</strong> length t follows a Poisson distribution with<br />

mean t (therefore it has stationary increments), i.e.<br />

PrNt + s − Ns = k = exp−t tk k= 0 1 2<br />

k!<br />

Exposure-to-<strong>Risk</strong><br />

The Poisson process setting is useful when one wants to analyse policyholders that have<br />

been observed during periods <strong>of</strong> unequal lengths. Assume that the claims occur according to<br />

a Poisson process with rate . If the policyholder is covered by the company for a period <strong>of</strong><br />

length d then the number N <strong>of</strong> claims reported to the company has probability mass function<br />

PrN = k = exp−d dk k= 0 1<br />

k!<br />

that is, N ∼ oid. In actuarial studies, d is referred to as the exposure-to-risk. We see<br />

that d simply multiplies the annual expected claim frequency in the Poisson model.<br />

Time Between Accidents<br />

The Poisson distribution arises for events occurring randomly and independently in time.<br />

Indeed, denote as T 1 T 2 the times between two consecutive accidents. Assume further<br />

that these accidents occur according to a Poisson process with rate . Then, the T k s are<br />

independent and identically distributed and<br />

PrT k >t= PrT 1 >t= PrN t = 0 = exp−t<br />

so that T 1 T 2 have a common Negative Exponential distribution.<br />

Note that in this case, the equality<br />

PrT k >s+ tT k >s= PrT k >s+ t<br />

PrT k >s<br />

= PrT k >t<br />

holds for any s and t ≥ 0. It is not difficult to see that this memoryless property is related<br />

to the fact that the increments <strong>of</strong> the process Nt t ≥ 0 are independent and stationary.<br />

Assuming that the claims occur according to a Poisson process is thus equivalent to assuming<br />

that the time between two consecutive claims has a Negative Exponential distribution.<br />

Nonhomogeneous Poisson Process<br />

A generalization <strong>of</strong> the Poisson process is obtained by letting the rate <strong>of</strong> the process<br />

vary with time. We then replace the constant rate by a function t ↦→ t <strong>of</strong> time t<br />

and we define the nonhomogeneous Poisson process with rate ·. The Poisson process<br />

defined above (with a constant rate) is then termed as the homogeneous Poisson process.<br />

A counting process Nt t ≥ 0 starting from N 0 = 0 is said to be a nonhomogeneous<br />

Poisson process with rate ·, where t ≥ 0 for all t ≥ 0, if it satisfies the following<br />

conditions:


Mixed Poisson Models for <strong>Claim</strong> Numbers 21<br />

(i) The process Nt t ≥ 0 has independent increments, and<br />

(ii)<br />

⎧<br />

⎪⎨ 1 − th + oh if k = 0<br />

PrNt + h − Nt = k = th + oh if k = 1<br />

⎪⎩<br />

oh if k ≥ 2<br />

The only difference between the nonhomogeneous Poisson process and the homogeneous<br />

Poisson process is that the rate may vary with time, resulting in the loss <strong>of</strong> the stationary<br />

increment property.<br />

For any nonhomogeneous Poisson process Nt t ≥ 0, the number <strong>of</strong> events in the<br />

interval s t, s ≤ t, is Poisson distributed with mean<br />

that is<br />

ms t =<br />

∫ t<br />

s<br />

udu<br />

PrNt − Ns = k = exp ( − ms t )( ms t ) k<br />

k= 0 1<br />

k!<br />

In the homogeneous case, we obviously have ms t = t − s.<br />

1.4 Mixed Poisson Distributions<br />

1.4.1 Expectations <strong>of</strong> General Random Variables<br />

Mixed Poisson distributions involve expectations <strong>of</strong> Poisson probabilities with a random<br />

parameter. Therefore, we need to be able to compute expectations with respect to general<br />

distribution functions.<br />

Continuous probability distributions are widely used in probability and statistics when<br />

the underlying random phenomenon is measured on a continuous scale. If the distribution<br />

function is a continuous function, the associated probability distribution is called a continuous<br />

distribution. Note that in this case,<br />

PrX = x = lim<br />

h↘0<br />

Prx


22 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The interpretation <strong>of</strong> the probability density function is that<br />

Prx ≤ X ≤ x + h ≈ fxh for small h>0<br />

That is, the probability that a random variable, with an absolutely continuous probability<br />

distribution, takes a value in a small interval <strong>of</strong> length h is given by the probability density<br />

function times the length <strong>of</strong> the interval.<br />

A general type <strong>of</strong> distribution function is a combination <strong>of</strong> the discrete and (absolutely)<br />

continuous cases, being continuous apart from a countable set <strong>of</strong> exception points<br />

x 1 x 2 x 3 with positive probabilities <strong>of</strong> occurrence, causing jumps in the distribution<br />

function at these points. Such a distribution function F X can be represented as<br />

F X x = 1 − pF c<br />

X x + pF d<br />

X x x ∈ (1.22)<br />

for some p ∈ 0 1, where F c<br />

X is a continuous distribution function and F d<br />

X<br />

distribution function with support d 1 d 2 .<br />

Let us assume that F X is <strong>of</strong> the form (1.22) with<br />

pF d<br />

X t = ∑ (<br />

)<br />

F X d n − F X d n − = ∑ PrX = d n <br />

d n ≤t<br />

d n ≤t<br />

is a discrete<br />

where d 1 d 2 denotes the set <strong>of</strong> discontinuity points and<br />

Then,<br />

∫ t<br />

1 − pF c<br />

X t = F X t − pF d<br />

X t = f c<br />

X xdx<br />

−<br />

EX = ∑ )<br />

d n<br />

(F X d n − F X d n − +<br />

n≥1<br />

∫ +<br />

−<br />

xf c<br />

X xdx (1.23)<br />

=<br />

∫ +<br />

−<br />

xdF X x<br />

where the differential <strong>of</strong> F X , denoted as dF X , is defined as<br />

{<br />

FX d n − F X d n − if x = d n <br />

dF X x =<br />

f c<br />

X xdx otherwise<br />

This unified notation allows us to avoid tedious repetitions <strong>of</strong> statements like ‘the pro<strong>of</strong><br />

is given for continuous random variables; the discrete case is similar’. A very readable<br />

introduction to differentials and Riemann–Stieltjes integrals can be found in Carter & Van<br />

Brunt (2000).<br />

1.4.2 Heterogeneity and Mixture Models<br />

Definition<br />

Mixture models are a discrete or continuous weighted combination <strong>of</strong> distributions aimed<br />

at representing a heterogeneous population comprised <strong>of</strong> several (two or more) distinct subpopulations.<br />

Such models are typically used when a heterogeneous population <strong>of</strong> sampling


Mixed Poisson Models for <strong>Claim</strong> Numbers 23<br />

units consists <strong>of</strong> several sub-populations within each <strong>of</strong> which a relatively simpler model<br />

applies. The source <strong>of</strong> heterogeneity could be gender, age, geographical area, etc.<br />

Discrete Mixtures<br />

In order to define a mixture model mathematically, suppose the distribution <strong>of</strong> N can be<br />

represented by a probability mass function <strong>of</strong> the form<br />

PrN = k = pk = q 1 p 1 k 1 +···+q p k (1.24)<br />

where = q T T T , q T = q 1 q , T = 1 . The model is usually referred<br />

to as a discrete (or finite) mixture model. Here j is a (vector) parameter characterizing the<br />

probability mass function p j · j and the q j s are mixing weights.<br />

Example 1.1 A particular example <strong>of</strong> finite mixture is the zero-inflated distribution. It has<br />

been observed empirically that counting distributions <strong>of</strong>ten show excess <strong>of</strong> zeros against the<br />

Poisson distribution. In order to accommodate this feature, a combination <strong>of</strong> the original<br />

distribution p k k= 0 1(be it Poisson or not) together with the degenerate distribution<br />

with all probability concentrated at the origin, gives a finite mixture with<br />

PrN = 0 = + 1 − p 0<br />

PrN = k = 1 − p k k= 1 2<br />

A mixture <strong>of</strong> this kind is usually referred to as zero-inflated, zero-modified or as a distribution<br />

with added zeros.<br />

Model (1.24) allows each component probability mass function to belong to a different<br />

parametric family. In most applications, a common parametric family is assumed and thus<br />

the mixture model takes the following form<br />

pk = q 1 pk 1 +···+q pk (1.25)<br />

which we assume to hold in the sequel. The mixing weight q can be regarded as a discrete<br />

probability function over , describing the variation in the choice <strong>of</strong> across the population<br />

<strong>of</strong> interest.<br />

This class <strong>of</strong> mixture models includes mixtures <strong>of</strong> Poisson distributions. Such a mixture<br />

is adequate to model count data (number <strong>of</strong> claims reported to an insurance company,<br />

number <strong>of</strong> accidents caused by an insured driver, etc.) where the components <strong>of</strong> the<br />

mixture are Poisson distributions with mean j . In that respect, (1.25) means that there<br />

are categories <strong>of</strong> policyholders, with annual expected claim frequencies 1 2 ,<br />

respectively. The proportion <strong>of</strong> the portfolio in the different categories is q 1 q 2 q ,<br />

respectively. Considering a given policyholder, the actuary does not know to which category<br />

he belongs, but the probability that he comes from category j is q j . The probability mass<br />

function <strong>of</strong> the number <strong>of</strong> claims reported by this insured driver is thus a weighted average<br />

<strong>of</strong> the probability mass functions associated with the k categories.


24 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Continuous Mixtures<br />

Multiplying the number <strong>of</strong> categories in (1.25) <strong>of</strong>ten leads to a dramatic increase in the<br />

number <strong>of</strong> parameters (the q j s and the j s). For large , it is therefore preferable to switch<br />

to a continuous mixture, where the sum in (1.25) is replaced with an integral with respect to<br />

some simple parametric continuous probability density function.<br />

Specifically, if we allow to be continuous with probability density function g·, the<br />

finite mixture model suggested above is replaced by the probability mass function<br />

∫<br />

pk = pkgd<br />

which is <strong>of</strong>ten referred to as a mixture distribution. When g· is modelled without parametric<br />

assumptions, the probability mass function p· is a semiparametric mixture model. Often in<br />

actuarial science, g· is taken from some parametric family, so that the resulting probability<br />

mass function is also parametric.<br />

Mixed Poisson Model for the Number <strong>of</strong> <strong>Claim</strong>s<br />

The Poisson distribution <strong>of</strong>ten poorly fits observations made in a portfolio <strong>of</strong> policyholders.<br />

This is in fact due to the heterogeneity that is present in the portfolio: driving abilities vary<br />

from individual to individual. Therefore it is natural to multiply the mean frequency <strong>of</strong><br />

the Poisson distribution by a positive random effect . The frequency will vary within the<br />

portfolio according to the nonobservable random variable . Obviously we will choose <br />

such that E = 1 because we want to obtain, on average, the frequency <strong>of</strong> the portfolio.<br />

Conditional on , we then have<br />

PrN = k = = pk = exp− k k= 0 1 (1.26)<br />

k!<br />

where p· is the Poisson probability mass function, with mean . The interpretation we<br />

give to this model is that not all policyholders in the portfolio have an identical frequency<br />

. Some <strong>of</strong> them have a higher frequency ( with ≥ 1), others have a lower frequency<br />

( with ≤ 1). Thus we use a random effect to model this empirical observation.<br />

The annual number <strong>of</strong> accidents caused by a randomly selected policyholder <strong>of</strong> the portfolio<br />

is then distributed according to a mixed Poisson law. In this case, the probability that a<br />

randomly selected policyholder reports k claims to the company is obtained by averaging the<br />

conditional probabilities (1.26) with respect to . In general, is not discrete nor continuous<br />

but <strong>of</strong> mixed type. The probability mass function associated with mixed Poisson models is<br />

defined as<br />

PrN = k = E [ pk ] =<br />

∫ <br />

0<br />

exp− k dF<br />

k! (1.27)<br />

where F denotes the distribution function <strong>of</strong> , assumed to fulfill F 0 = 0. The mixing<br />

distribution described by F represents the heterogeneity <strong>of</strong> the portfolio <strong>of</strong> interest; dF <br />

is <strong>of</strong>ten called the structure function. It is worth mentioning that the mixed Poisson model<br />

(1.27) is an accident-proneness model: it assumes that a policyholder’s mean claim frequency<br />

does not change over time but allows some insured persons to have higher mean claim<br />

frequencies than others. We will say that N is mixed Poisson distributed with parameter <br />

and risk level , denoted as N ∼ oi when it has probability mass function (1.27).


Mixed Poisson Models for <strong>Claim</strong> Numbers 25<br />

Remark 1.1 Note that a better notation would have been oi F instead <strong>of</strong><br />

oi since only the distribution function <strong>of</strong> matters to define the associated<br />

Poisson mixture. We have nevertheless opted for oi for simplicity.<br />

Note that the condition E = 1 ensures that when N ∼ oi <br />

EN =<br />

∫ +∑<br />

0<br />

k=0<br />

= E = <br />

k exp− k dF<br />

k! <br />

or, more briefly,<br />

[ ]<br />

EN = E EN = E = (1.28)<br />

In (1.28), E· means that we take an expected value considering as a constant. We<br />

then average with respect to all the random components, except . Consequently, E· is<br />

a function <strong>of</strong> . Given , N is Poisson distributed with mean so that EN = . The<br />

mean <strong>of</strong> N is finally obtained by averaging EN with respect to . The expectation <strong>of</strong> N<br />

given in (1.28) is thus the same as the expectation <strong>of</strong> a oi distributed random variable.<br />

Taking the heterogeneity into account by switching from the oi to the oi <br />

distribution has no effect on the expected claim number.<br />

1.4.3 Mixed Poisson Process<br />

The Poisson processes are suitable models for many real counting phenomena but they are<br />

insufficient in some cases because <strong>of</strong> the deterministic character <strong>of</strong> their intensity function.<br />

The doubly stochastic Poisson process (or Cox process) is a generalization <strong>of</strong> the Poisson<br />

process when the rate <strong>of</strong> occurrence is influenced by an external process such that the<br />

rate becomes a random process. So, the rate, instead <strong>of</strong> being constant (homogeneous<br />

Poisson process) or a deterministic function <strong>of</strong> time (nonhomogeneous Poisson process)<br />

becomes itself a stochastic process. The only restriction on the rate process is that it<br />

has to be nonnegative. Mixed Poisson distributions are linked to mixed Poisson processes<br />

in the same way that the Poisson distribution is associated with the standard Poisson<br />

process.<br />

Specifically, let us assume that given = , Nt t ≥ 0 is a homogeneous Poisson<br />

process with rate . Then Nt t ≥ 0 is a mixed Poisson process, and for any s t ≥ 0,<br />

the probability that k events occur during the time interval s t is<br />

PrNt + s − Ns = k =<br />

=<br />

∫ <br />

0<br />

∫ <br />

0<br />

PrNt + s − Ns = k = dF <br />

exp−t tk dF<br />

k! <br />

that is, Nt + s − Ns ∼ oit . Note that, in contrast to the Poisson process, mixed<br />

Poisson processes have dependent increments. Hence, past number <strong>of</strong> claims reveal future<br />

number <strong>of</strong> claims in this setting (in contrast to the Poisson case).


26 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

1.4.4 Properties <strong>of</strong> Mixed Poisson Distributions<br />

Moments and Overdispersion<br />

If N ∼ oi then its second moment is<br />

so that<br />

EN 2 =<br />

∫ +<br />

0<br />

+ 2 2 dF = E + 2 E 2 <br />

VN = E + 2 E 2 − 2( E ) 2<br />

= + 2 V (1.29)<br />

It is then easily seen that the variance <strong>of</strong> N exceeds its mean, that is,<br />

VN = + 2 V ≥ = EN (1.30)<br />

Therefore, unless is degenerated in 1, we observe that mixed Poisson distributions are<br />

overdispersed: the variance exceeds the mean. The skewness can be expressed as<br />

(<br />

)<br />

1<br />

N =<br />

3VN − 2EN +<br />

VN − EN 2<br />

√ (1.31)<br />

VN 3/2 V EN<br />

Shaked’s Two Crossings Theorem<br />

Recall that EX ≥ EX for any random variable X and convex function . This<br />

inequality, known as the Jensen inequality, ensures that if N ∼ oi then<br />

PrN = 0 =<br />

∫ <br />

0<br />

(<br />

exp−dF ≥ exp −<br />

∫ <br />

0<br />

)<br />

dF = exp−<br />

showing that mixed Poisson distributions have an excess <strong>of</strong> zeros compared to Poisson<br />

distributions with the same mean. This is in line with empirical studies, where actuaries <strong>of</strong>ten<br />

observe more policyholders producing 0 claims than the number predicted by the Poisson<br />

model.<br />

The following result that has been proved by Shaked (1980) reinforces this straightforward<br />

conclusion.<br />

Property 1.2 Let N be mixed Poisson distributed with mean EN = . Then there exist<br />

two integers 0 ≤ k 0


Mixed Poisson Models for <strong>Claim</strong> Numbers 27<br />

Shaked’s Two Crossings Theorem tells us (i) that the mixed Poisson distribution has an<br />

excess <strong>of</strong> zeros compared to the Poisson distribution with the same mean and (ii) that the<br />

mixed Poisson distribution has a thicker right tail than the Poisson distribution with the<br />

same mean.<br />

Probability Generating Function<br />

The probability generating function <strong>of</strong> Poisson mixtures is closely related to the moment<br />

generating function <strong>of</strong> the underlying random effect. Moment generating functions are a<br />

widely used tool in many statistics texts, and also in actuarial mathematics. They serve<br />

to prove statements about convolutions <strong>of</strong> distributions, and also about limits. Recall that<br />

the moment generating function <strong>of</strong> the nonnegative random variable X, denoted as M X ,is<br />

given by<br />

M X t = EexptX t > 0<br />

It is interesting to mention that M X characterizes the probability distribution <strong>of</strong> X, i.e. the<br />

information contained in F X and M X is equivalent.<br />

If there exists h>0 such that M X t exists and is finite for 0


28 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

From (1.33), we see that the knowledge <strong>of</strong> the mixed Poisson distribution oi <br />

is equivalent to the knowledge <strong>of</strong> F . The mixed Poisson distributions are thus identifiable,<br />

that is, having N 1 ∼ oi 1 and N 2 ∼ oi 2 then N 1 and N 2 are identically<br />

distributed if, and only if, 1 and 2 are identically distributed.<br />

1.4.5 Negative Binomial Distribution<br />

Gamma Distribution<br />

Recall that a random variable X is distributed according to the two-parameter Gamma<br />

distribution, which will henceforth be denoted as X ∼ am , if its probability density<br />

function is given by<br />

fx = x−1 exp−x<br />

x>0 (1.34)<br />

<br />

Note that when = 1, the Gamma distribution reduces to the Negative Exponential one<br />

(which is denoted as X ∼ xp) with probability density function<br />

fx = exp−x<br />

x > 0<br />

The distribution function F <strong>of</strong> X can be expressed in terms <strong>of</strong> the incomplete Gamma<br />

function. Specifically, if X ∼ am , then Fx = x.<br />

Probability Mass Function<br />

The Negative Binomial distribution is a widely used alternative to the Poisson distribution for<br />

handling count data when the variance is appreciably greater than the mean (this condition,<br />

known as overdispersion, is frequently met in practice, as discussed above).<br />

There are several models that lead to the Negative Binomial distribution. A classic example<br />

arises from the theory <strong>of</strong> accident proneness which was developed after Greenwood &<br />

Yule (1920). This theory assumes that the number <strong>of</strong> accidents suffered by an individual<br />

is Poisson distributed, but that the Poisson mean (interpreted as the individual’s accident<br />

proneness) varies between individuals in the population under study. If the Poisson mean<br />

is assumed to be Gamma distributed, then the Negative Binomial is the resultant overall<br />

distribution <strong>of</strong> accidents per individual.<br />

Specifically, completing (1.26)–(1.27) with ∼ ama a, that is, with probability<br />

density function<br />

f = 1<br />

a aa a−1 exp−a > 0 (1.35)<br />

yields the Negative Binomial probability mass function<br />

( )<br />

a + k − 1 ···a a a ( ) d k<br />

PrN = k =<br />

k! a + d a + d<br />

( )<br />

a + k a a ( ) d k<br />

= k= 0 1 2<br />

ak! a + d a + d


Mixed Poisson Models for <strong>Claim</strong> Numbers 29<br />

where is the annual expected claim number and d is the length <strong>of</strong> the observation period<br />

(the exposure-to-risk). The probability mass function can be expressed using the generalized<br />

binomial coefficient:<br />

( )<br />

a + k a a ( ) d k<br />

PrN = k =<br />

ak + 1 a + d a + d<br />

( )( ) a + k − 1 a a ( ) d k<br />

=<br />

k= 0 1 2<br />

k a + d a + d<br />

Henceforth, we write N ∼ ina d to indicate that N obeys the Negative Binomial<br />

distribution with parameters a and d. This model has been applied to retail purchasing,<br />

absenteeism, doctor’s consultations, amongst many others.<br />

Moments<br />

If X ∼ am , its mean is EX = / and its variance is VX = / 2 .IfN ∼<br />

ina d then the mean is EN = d and the variance is VN = d+d 2 /a according<br />

to (1.29). It can be shown that / √ V = 2, in (1.31) for the Negative Binomial<br />

distribution.<br />

Probability Generating Function<br />

If X ∼ am , its moment generating function is<br />

(<br />

Mt = 1 −<br />

) t −<br />

if t< (1.36)<br />

The probability generating function <strong>of</strong> N ∼ ina d is<br />

(<br />

)<br />

a<br />

a<br />

N z =<br />

(1.37)<br />

a − dz − 1<br />

This result comes from (1.33) together with (1.36).<br />

True and Apparent Contagion<br />

Apparent contagion arises from the recognition that sampled individuals come from a<br />

heterogeneous population in which individuals have a constant but different propensity to<br />

experience accidents. A given individual may have a high (or low) propensity for accidents<br />

but occurrence <strong>of</strong> an accident does not make it more (or less) likely that another accident will<br />

occur. However, aggregation across heterogeneous individuals may generate a misleading<br />

statistical finding which suggests that occurrence <strong>of</strong> an accident increases the probability <strong>of</strong><br />

another accident; the observed but persistent heterogeneity can be misinterpreted as due to<br />

a strong serial dependence.<br />

True contagion refers to dependence between the occurrences <strong>of</strong> successive events. The<br />

occurrence <strong>of</strong> an event, such as an accident or illness, may change the probability <strong>of</strong><br />

subsequent occurrences <strong>of</strong> similar events. True positive contagion implies that the occurrence<br />

<strong>of</strong> an event shortens the expected waiting time to the next occurrence <strong>of</strong> the event.<br />

The alleged phenomenon <strong>of</strong> accident proneness can be interpreted in terms <strong>of</strong> true<br />

contagion as suggesting that an individual who has experienced an accident is more likely


30 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

to experience another accident. In a longitudinal setting, actual and future outcomes are<br />

directly influenced by past values, and this causes a substantial change over time in the<br />

corresponding distribution.<br />

Since with event count data we only observe the total number <strong>of</strong> events at the end<br />

<strong>of</strong> the period, contagion, like heterogeneity, is an unobserved, within-observation process.<br />

For research problems where both heterogeneity and contagion are plausible, the different<br />

underlying processes are not distinguishable with aggregate count data because they both<br />

lead to the same probability distribution for the counts. One can still use this distribution to<br />

derive fully efficient and consistent estimates, but this analysis will only be suggestive <strong>of</strong><br />

the underlying process.<br />

Poisson Limiting Form<br />

The Negative Binomial distribution has a Poisson limiting form if V = 1 → 0. This result<br />

a<br />

can be recovered from the sequence <strong>of</strong> the probability generating functions, noting that<br />

(<br />

lim<br />

a↗<br />

a<br />

a − d1 − z<br />

) a (<br />

= lim 1 − d ) −a<br />

a↗ a 1 − z = exp−d1 − z<br />

that is seen to converge to the probability generating function <strong>of</strong> the Poisson distribution<br />

with parameter d.<br />

Derivation as a Compound Poisson Distribution<br />

A different type <strong>of</strong> heterogeneity occurs when there is clustering. If it is assumed that<br />

the number <strong>of</strong> clusters is Poisson distributed, but the number <strong>of</strong> individuals in a cluster is<br />

distributed according to the Logarithmic distribution, then the overall distribution is Negative<br />

Binomial. In an actuarial context, this amounts to recognizing that several vehicles can<br />

be involved in the same accident, each <strong>of</strong> the insured drivers filing a claim. Therefore, a<br />

single accident may generate several claims. If the number <strong>of</strong> claims per accident follows<br />

a Logarithmic distribution, and the number <strong>of</strong> accidents over the time interval <strong>of</strong> interest<br />

follows a Poisson distribution, then the total number <strong>of</strong> claims for the time interval can be<br />

modelled with the Negative Binomial distribution.<br />

Let us formally establish this result. Recall that the random variable M has a Logarithmic<br />

distribution if<br />

PrM = k =<br />

k<br />

−k ln1 − <br />

k= 1 2<br />

where 0


Mixed Poisson Models for <strong>Claim</strong> Numbers 31<br />

The random variable N just defined has a compound Poisson distribution. The probability<br />

generating function <strong>of</strong> a compound distribution is given by<br />

N z = Ez M 1+···+M K<br />

<br />

=<br />

=<br />

+∑<br />

k=0<br />

+∑<br />

k=0<br />

PrK = kEz M 1+···+M k<br />

<br />

PrK = k ( M z ) k<br />

= K<br />

(<br />

M z ) (1.38)<br />

Note that formula (1.38) is true in general for compound distributions. Replacing K and<br />

M with their expressions gives the probability generating function <strong>of</strong> N<br />

(<br />

)<br />

N z = exp − 1 − M z<br />

( ) 1 − −/ ln1−<br />

=<br />

<br />

1 − z<br />

It can be checked that the probability generating function N corresponds to the probability<br />

generating function (1.37) <strong>of</strong> a Negative Binomial distribution with d = 1, a =−/ ln1−<br />

and =−/1 − ln1 − .<br />

1.4.6 Poisson-Inverse Gaussian Distribution<br />

There is no reason to restrict ourselves to the Gamma distribution for , except perhaps<br />

mathematical convenience. In fact, any distribution with support in the half positive real line<br />

is a candidate to model the stochastic behaviour <strong>of</strong> . Here, we discuss the Inverse Gaussian<br />

distribution.<br />

Inverse Gaussian Distribution<br />

The Inverse Gaussian distribution is an ideal candidate for modelling positive, right-skewed<br />

data. Recall that a random variable X is distributed according to the Inverse Gaussian<br />

distribution, which will be henceforth denoted as X ∼ au , if its probability density<br />

function is given by<br />

fx =<br />

(<br />

<br />

√ exp − 1 )<br />

2x<br />

3 2x x − 2 x>0 (1.39)<br />

If X ∼ au then the mean is EX = and the variance is VX = . The moment<br />

generating function is given by<br />

∫ +<br />

(<br />

<br />

Mt = √ exp − 1<br />

)<br />

0 2x<br />

3 2x x − 2 + tx dx


32 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

( ) ∫ +<br />

(<br />

<br />

= exp √<br />

<br />

exp − 1 (<br />

x 2 1 − 2t + 2)) dx<br />

0 2x<br />

3 2x<br />

Making the change <strong>of</strong> variable = x √ 1 − 2t yields<br />

⎛<br />

⎞<br />

( ) ∫ +<br />

<br />

1<br />

Mt = exp √<br />

⎝−<br />

0<br />

2 √<br />

<br />

2 √ 2 + 2 ⎠ d<br />

1−2t<br />

1−2t 3 exp<br />

( ( √ ) )<br />

= exp 1 − 1 − 2t (1.40)<br />

<br />

For the last three decades, the Inverse Gaussian distribution has gained attention in<br />

describing and analyzing right-skewed data. The main appeal <strong>of</strong> Inverse Gaussian models<br />

lies in the fact that they can accommodate a variety <strong>of</strong> shapes, from highly skewed to<br />

almost Normal. Moreover, they share many elegant and convenient properties with Gaussian<br />

models. In applied probability, the Inverse Gaussian distribution arises as the distribution <strong>of</strong><br />

the first passage time to an absorbing barrier located at a unit distance from the origin in a<br />

Wiener process.<br />

Poisson-Inverse Gaussian Distribution<br />

Let us now complete (1.26)–(1.27) with ∼ au1, that is,<br />

(<br />

1<br />

f = √ exp − 1 )<br />

2<br />

3 2 − 12 >0 (1.41)<br />

The probability mass function is given by<br />

PrN = k =<br />

∫ <br />

0<br />

exp−d dk<br />

k!<br />

(<br />

1<br />

√ exp − 1 )<br />

2<br />

3 2 − 12 d (1.42)<br />

The probability mass function can be expressed using modified Bessel functions <strong>of</strong> the<br />

second kind. Bessel functions have some useful properties that can be used to compute the<br />

Poisson-Inverse Gaussian probabilities and to find the maximum likelihood estimators, for<br />

instance.<br />

Moments and Probability Generating Function<br />

Considering (1.28) and (1.29), we have<br />

EN = and VN = + 2 <br />

It can be shown that / √ V = 3 in (1.31) for the Poisson-Inverse Gaussian distribution.<br />

Therefore the skewness <strong>of</strong> a Poisson-Inverse Gaussian distribution exceeds the skewness <strong>of</strong><br />

the Negative Binomial distribution having the same mean and the same variance.<br />

Setting = 1 and = , the probability generating function <strong>of</strong> N can be obtained from<br />

(1.33) together with (1.40), which gives<br />

( 1<br />

(<br />

N z = exp 1 − √ 1 − 2z − 1) )


Mixed Poisson Models for <strong>Claim</strong> Numbers 33<br />

Computation <strong>of</strong> the Probability Mass Function<br />

The probability mass at the origin is<br />

( 1<br />

(<br />

N 0 = PrN = 0 = exp 1 − √ 1 + 2) ) <br />

<br />

Now, taking the derivatives <strong>of</strong> N with respect to t, and evaluating it at 0 gives the probability<br />

mass function for positive integers. Specifically,<br />

′ N<br />

0 = PrN = 1<br />

<br />

∣<br />

= √ N z<br />

1 − 2z − 1<br />

=<br />

<br />

√<br />

1 + 2<br />

PrN = 0<br />

∣<br />

z=0<br />

and<br />

′ N<br />

0 = 2PrN = 2<br />

2 <br />

<br />

∣<br />

= ( ) 3/2<br />

N z∣ + √ ′ N z ∣∣z=0<br />

1 − 2z − 1 z=0 1 − 2z − 1<br />

=<br />

=<br />

2 <br />

<br />

( ) 3/2<br />

PrN = 0 + √ PrN = 1<br />

1 + 2 1 + 2<br />

√ ( )<br />

2 1 + 2<br />

2<br />

( ) 3/2<br />

PrN = 1 + √ PrN = 0<br />

1 + 2 <br />

1 + 2<br />

= <br />

2<br />

PrN = 1 + PrN = 0<br />

1 + 2 1 + 2<br />

In general, we have the following recursive formula<br />

PrN = n =<br />

2<br />

1 + 2<br />

+<br />

(<br />

1 − 3 )<br />

PrN = n − 1<br />

2n<br />

2<br />

PrN = n − 2 (1.43)<br />

1 + 2nn − 1<br />

valid for n = 2 3 4, which allows us to compute the probability mass function <strong>of</strong> the<br />

Poisson-Inverse Gaussian distribution. The formal pro<strong>of</strong> <strong>of</strong> (1.43) is based on properties <strong>of</strong><br />

the modified Bessel function.<br />

1.4.7 Poisson-LogNormal Distribution<br />

In addition to the Gamma and Inverse Gaussian distributions to model , the LogNormal<br />

distribution is <strong>of</strong>ten used in biostatistical studies.


34 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

LogNormal Distribution<br />

Recall that a random variable X is Normally distributed with mean and variance 2 ,<br />

denoted as X ∼ or 2 , if its distribution function is<br />

where<br />

( x − <br />

)<br />

Fx = <br />

<br />

x = 1 √<br />

2<br />

∫ x<br />

−<br />

exp−y 2 /2 dy (1.44)<br />

Now, a random variable X is LogNormally distributed with parameters and (notation<br />

X ∼ or 2 )iflnX is Normally distributed with mean and variance 2 , that is, if<br />

its probability density function is given by<br />

fx =<br />

(<br />

1<br />

√ exp − 1<br />

)<br />

2x 2 ln x − 2 2 <br />

If X ∼ or 2 , then its mean is<br />

)<br />

EX = exp<br />

( + 2<br />

<br />

2<br />

and its variance<br />

VX = exp ( 2 + 2)( exp 2 − 1 ) <br />

x>0<br />

Poisson-LogNormal Distribution<br />

Taking =− 2 /2 (to ensure that E = 1), the probability density function <strong>of</strong> ∼<br />

or− 2 /2 2 is<br />

(<br />

1<br />

f =<br />

√ 2 exp − ln + )<br />

2 /2 2<br />

>0 (1.45)<br />

2 2<br />

The probability mass function <strong>of</strong> the Poisson-LogNormal distribution is given by<br />

PrN = k = 1<br />

√ d k<br />

2 k!<br />

∫ <br />

Coming back to (1.28) and (1.29), we easily see that<br />

0<br />

exp−d k−1 exp<br />

(− ln + )<br />

2 /2 2<br />

d<br />

2 2<br />

EN = and VN = + 2( exp 2 − 1 ) <br />

It can be shown that / √ V = 2 + exp 2 in (1.31) for the Poisson-LogNormal<br />

distribution. Therefore the skewness <strong>of</strong> a Poisson-LogNormal distribution exceeds the<br />

skewness <strong>of</strong> the Poisson-Inverse Gaussian distribution having the same mean and the same<br />

variance.


Mixed Poisson Models for <strong>Claim</strong> Numbers 35<br />

1.5 Statistical Inference for Discrete Distributions<br />

1.5.1 Maximum Likelihood Estimators<br />

Maximum likelihood is a method <strong>of</strong> estimation and inference for parametric models. The<br />

maximum likelihood estimator is the value <strong>of</strong> the parameter (or parameter vector) that makes<br />

the observed data most likely to have occurred given the data generating process assumed<br />

to have produced the variable <strong>of</strong> interest.<br />

The likelihood <strong>of</strong> a sample <strong>of</strong> observations is defined as the joint density <strong>of</strong> the data, with<br />

the parameters taken as variable and the data as fixed (multiplied by any arbitrary constant<br />

or function <strong>of</strong> the data but not <strong>of</strong> the parameters). Specifically, let N 1 N 2 N n be a set<br />

<strong>of</strong> independent and identically distributed outcomes with probability mass function p·<br />

where is a vector <strong>of</strong> parameters. The likelihood function is the probability <strong>of</strong> observing<br />

the data N 1 = k 1 N n = k n , that is,<br />

=<br />

n∏<br />

pk i <br />

i=1<br />

The key idea for estimation in likelihood problems is that the most reasonable estimate is<br />

the value <strong>of</strong> the parameter vector that would make the observed data most likely to occur.<br />

The implicit assumption is <strong>of</strong> course that the data at hand are reliable. More formally we<br />

seek a value <strong>of</strong> that maximizes . The maximum likelihood estimator <strong>of</strong> is the<br />

random variable ̂ for which the likelihood is maximum, that is<br />

̂ ≥ for all <br />

It is usually simpler mathematically to find the maximum <strong>of</strong> the logarithm <strong>of</strong> the likelihood<br />

L = ln =<br />

n∑<br />

ln pk i <br />

rather than the likelihood itself. The function L is usually referred to as the log-likelihood.<br />

Because the logarithm is a monotonic transformation, the log-likelihood will be maximized<br />

at the same parameter value that maximizes the likelihood (although the shape <strong>of</strong> the loglikelihood<br />

is different from that <strong>of</strong> the likelihood).<br />

When working with counting variables, it is <strong>of</strong>ten easier to use the observed frequencies<br />

i=1<br />

f k = #observations equal to k k = 0 1 2 (1.46)<br />

In other words, f k is the number <strong>of</strong> times that the value k has been observed in the sample.<br />

Denoting the largest observation as<br />

the log-likelihood becomes<br />

k max = max k i<br />

i=1n<br />

k∑<br />

max<br />

L = f k ln pk<br />

k=0


36 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

We may solve analytically for the maximum likelihood estimator. To maximize any regular<br />

function, we find the value <strong>of</strong> the parameters that makes the first derivatives <strong>of</strong> the function<br />

with respect to the parameters equal to zero. The first derivative <strong>of</strong> the log-likelihood is<br />

called Fisher’s score, and is denoted by<br />

U j = <br />

j<br />

L j = 1dim (1.47)<br />

Then one can find the maximum likelihood estimator by setting the score to zero, i.e. by<br />

solving the system <strong>of</strong> equations<br />

U j = 0 j= 1dim<br />

We also check the second derivatives to ensure that this is a maximum.<br />

Example 1.2 Assume that policyholder i has been observed during a period d i and produced<br />

k i claims. Assuming that the annual number <strong>of</strong> claims is Poisson distributed with mean <br />

(here, is the annual expected claim frequency, so that policyholder i is expected to produce<br />

d i claims during the observation period, under the condition <strong>of</strong> the Poisson process); the<br />

log-likelihood is<br />

( n∑<br />

L = ln exp−d i d )<br />

i k i<br />

i=1<br />

k i !<br />

n∑ n∑ ( ) n∑<br />

=− d i + k i ln + ln di − ln k i !<br />

i=1<br />

i=1<br />

i=1<br />

n∑<br />

n∑<br />

=− d i + ln k i + constant<br />

Setting the first derivative <strong>of</strong> L with respect to equal to 0 gives<br />

The second derivative is<br />

i=1<br />

i=1<br />

n∑<br />

− d i + 1 ∑ n∑<br />

n<br />

k<br />

i=1<br />

i = 0 ⇒ ̂ i=1<br />

=<br />

k i<br />

∑ n<br />

i=1<br />

i=1 d <br />

i<br />

− 1 2<br />

n∑<br />

k i < 0 for any <br />

i=1<br />

so that ̂ indeed corresponds to the maximum <strong>of</strong> L. The estimated annual expected claim<br />

frequency ̂ is thus obtained as the ratio <strong>of</strong> the total number <strong>of</strong> claims to the total exposureto-risk.<br />

It is important to note here that the total number <strong>of</strong> claims is not divided by the<br />

number <strong>of</strong> policies, because <strong>of</strong> unequal risk exposure.


Mixed Poisson Models for <strong>Claim</strong> Numbers 37<br />

Example 1.3 Assume, as above, that policyholder i has been observed during a period d i<br />

and produced k i claims. Assuming that the annual number <strong>of</strong> claims filed by policyholder i<br />

is Negative Binomially distributed with mean d i , the log-likelihood is<br />

La =<br />

n∑<br />

k i −1<br />

∑<br />

i=1 j=0<br />

n∑<br />

n∑<br />

lna + j + na ln a − a + k i lna + d i + ln k i + constant<br />

The maximum likelihood estimators for a and solve<br />

k<br />

<br />

n∑ i −1<br />

a La = ∑ 1<br />

n∑<br />

n∑<br />

i=1 j=0<br />

a + j + n ln a + n − a + k<br />

lna + d i −<br />

i<br />

= 0<br />

i=1<br />

i=1<br />

a + d i<br />

<br />

n∑<br />

La =− a + k<br />

d i<br />

i + 1 n∑<br />

k<br />

a + d i i = 0<br />

i=1<br />

i=1<br />

i=1<br />

These equations do not possess explicit solutions, and must be solved numerically. A<br />

convenient choice is to use the Newton–Raphson algorithm (see Section 1.5.3). Initial<br />

values for the parameters are obtained by the method <strong>of</strong> moments. Specifically, the moment<br />

estimator <strong>of</strong> is simply<br />

i=1<br />

ˆ =<br />

∑ n<br />

i=1 k i<br />

∑ n<br />

i=1 d i<br />

<br />

which is the maximum likelihood estimate <strong>of</strong> in the homogeneous Poisson case. For the<br />

variance, we start from VN i = EN i + d i 2 where = V i . The empirical analogue<br />

is given by<br />

ˆ =<br />

∑ n<br />

i=1<br />

(k i − d i 2 − d i<br />

)<br />

∑ n<br />

i=1 d i 2<br />

from which we easily deduce an estimator for a in the Negative Binomial case.<br />

1.5.2 Properties <strong>of</strong> the Maximum Likelihood Estimators<br />

Maximum likelihood estimators enjoy a number <strong>of</strong> convenient properties that are discussed<br />

below. It is important to note that these are asymptotic properties, i.e. properties that hold<br />

only as the sample size becomes infinitely large. It is impossible to say in general at what<br />

point a sample is large enough for these properties to apply, but the majority <strong>of</strong> actuarial<br />

applications involve large data sets so that actuaries generally trust in the large sample<br />

properties <strong>of</strong> the maximum likelihood estimators.<br />

Consistency<br />

First, maximum likelihood estimators are consistent. There are several definitions <strong>of</strong><br />

consistency, but an intuitive version is that as the sample size gets large the estimator is<br />

increasingly likely to fall within a small region around the true value <strong>of</strong> the parameter. This


38 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

is called convergence in probability and is defined more formally as follows: A consistent<br />

estimator T j for some parameter j computed from a sample <strong>of</strong> size n is one for which<br />

lim<br />

n↗ PrT j − j ≥c = 0<br />

for all positive c. This will henceforth be denoted as T j → proba j as n ↗+. A consistent<br />

estimator is thus an estimator that converges to the population parameter as the sample size<br />

goes to infinity. Consistency is an asymptotic property.<br />

Asymptotic Normality<br />

Any estimator will vary across repeated samples. We must be able to calculate this<br />

variability in order to express our uncertainty about a parameter value and to make statistical<br />

inferences about the parameters. This variability is measured by the variance-covariance<br />

matrix <strong>of</strong> the estimators. This matrix provides the variances for each parameter on the<br />

main diagonal while the <strong>of</strong>f-diagonal elements estimate the covariances between all pairs <strong>of</strong><br />

parameters.<br />

The asymptotic variance-covariance matrix ̂<br />

for maximum likelihood estimators ̂ is<br />

the inverse <strong>of</strong> what is called the Fisher information matrix . Element ij <strong>of</strong> is given<br />

by<br />

[ ] [ ]<br />

<br />

2<br />

<br />

2<br />

−E ln L =−nE ln pN<br />

i j i 1 <br />

j<br />

[ <br />

= nE ln pN<br />

1 ]<br />

ln pN<br />

i 1 <br />

j<br />

∑<br />

= n pk ln pk ln pk<br />

i j<br />

k=0<br />

Thus, ̂<br />

= ( ) −1<br />

.<br />

An insight into why this makes sense is that the second derivatives measure the rate <strong>of</strong><br />

change in the first derivatives, which in turn determines the value <strong>of</strong> the maximum likelihood<br />

estimate. If the first derivatives are changing rapidly near the maximum, then the peak <strong>of</strong><br />

the likelihood is sharply defined and the maximum is easy to see. In this case, the second<br />

derivatives will be large and their inverse small, indicating a small variance <strong>of</strong> the estimated<br />

parameters. If on the other hand the second derivatives are small, then the likelihood function<br />

is relatively flat near the maximum and so the parameters are less precisely estimated. The<br />

inverse <strong>of</strong> the second derivatives will produce a large value for the variance <strong>of</strong> the estimates,<br />

indicating low precision <strong>of</strong> the estimates.<br />

The distribution <strong>of</strong> ̂ is usually difficult to obtain. Therefore we resort to the following<br />

asymptotic theory: Under mild regularity conditions (including that the true value <strong>of</strong> the<br />

parameter must be interior to the parameter space, that the log-likelihood function must be<br />

thrice differentiable, and that the third derivatives must be bounded) that are usually fulfilled,<br />

the maximum likelihood estimator ̂ has approximately in large samples a multivariate<br />

Normal distribution with mean equal to the true parameter and variance-covariance matrix<br />

given by the inverse <strong>of</strong> the information matrix.


Mixed Poisson Models for <strong>Claim</strong> Numbers 39<br />

Recall that having a n × n positive definite matrix M and a real vector , the random<br />

vector X = X 1 X 2 X n T is said to have the multivariate Normal distribution with mean<br />

and variance-covariance matrix M if its probability density function is <strong>of</strong> the form<br />

f X x =<br />

1<br />

√<br />

(−<br />

2n detM exp 1 )<br />

2 x − T M −1 x − x ∈ n (1.48)<br />

Henceforth, we indicate that the random vector X has the multivariate Normal distribution<br />

with probability density function (1.48) as X ∼ or M. A convenient characterization<br />

<strong>of</strong> the multivariate Normal distribution is as follows: X ∼ or M if, and only if, any<br />

random variable <strong>of</strong> the form ∑ n<br />

i=1 iX i with ∈ R n , has the univariate Normal distribution.<br />

Coming back to the properties <strong>of</strong> the maximum likelihood estimator ̂, we have that<br />

̂ is approximately or ̂<br />

distributed (1.49)<br />

that is, the distribution function <strong>of</strong> ̂ can be approximated by integrating the Normal<br />

probability density function<br />

f S =<br />

1<br />

√<br />

2 dim det̂<br />

(<br />

exp<br />

− 1 2 S − T −1 S − <br />

̂<br />

)<br />

<br />

S ∈ dim <br />

Attribute (1.49) says that maximum likelihood estimators converge in distribution to a<br />

Normal with mean equal to the population value <strong>of</strong> the parameter and variance-covariance<br />

matrix equal to the inverse <strong>of</strong> the information matrix. This means that regardless <strong>of</strong> the<br />

distribution <strong>of</strong> the variable <strong>of</strong> interest the maximum likelihood estimator <strong>of</strong> the parameters<br />

will have a multivariate Normal distribution. Thus, a variable may be Poisson distributed,<br />

but the maximum likelihood estimate <strong>of</strong> the Poisson mean will be asymptotically Normally<br />

distributed, and likewise for any distribution. Note however that in the Poisson case, the exact<br />

distribution <strong>of</strong> the maximum likelihood estimator <strong>of</strong> the parameter derived in Example 1.2<br />

can easily be derived from the stability <strong>of</strong> the Poisson family under convolution.<br />

Invariance<br />

A natural question is how the parameterization <strong>of</strong> a likelihood affects the resulting<br />

inference. Maximum likelihood has the property that any transformation <strong>of</strong> a parameter<br />

can be estimated by the same transformation <strong>of</strong> the maximum likelihood estimate <strong>of</strong><br />

that parameter. This provides substantial flexibility in how we parameterize our models<br />

while guaranteeing that we will get the same result if we start with a different<br />

parameterization.<br />

The invariance property can be stated formally as follows: If = t, where t· is a<br />

one-to-one transformation, then the maximum likelihood estimator <strong>of</strong> is t̂. In particular,<br />

the maximum likelihood estimator <strong>of</strong> pk is simply pk̂, that is,<br />

̂pk = pk̂


40 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

1.5.3 Computing the Maximum Likelihood Estimators with the<br />

Newton–Raphson Algorithm<br />

Calculation <strong>of</strong> the maximum likelihood estimators <strong>of</strong>ten requires iterative procedures. Let<br />

H denote the Hessian (or matrix <strong>of</strong> second derivatives) <strong>of</strong> the log-likelihood function, with<br />

elements<br />

H ij =<br />

2<br />

i j<br />

L<br />

= <br />

i<br />

U j (1.50)<br />

k∑<br />

max<br />

2<br />

=− f k ln pk<br />

i j<br />

k=0<br />

for i j = 1dim. For ⋆ close enough to ̂, a first-order Taylor expansion gives<br />

0 = Û ≈ U ⋆ + H ⋆ <br />

(̂ − <br />

⋆)<br />

yielding<br />

̂ ≈ ⋆ − H −1 ⋆ U ⋆ <br />

Starting from an appropriate initial value 0 , the Newton–Raphson algorithm is based on<br />

the recurrence relation<br />

̂r+1 = ̂<br />

)<br />

r −1<br />

− H<br />

(̂r U<br />

(̂r) (1.51)<br />

This result provides the basis for an iterative approach for computing the maximum<br />

likelihood estimator known as the Newton–Raphson technique. Given a trial value, we use<br />

(1.51) to obtain an improved estimate and repeat the process until the elements <strong>of</strong> the vector<br />

<strong>of</strong> first derivatives are sufficiently close to zero.<br />

This procedure tends to converge quickly if the log-likelihood is well-behaved in a<br />

neighbourhood <strong>of</strong> the maximum and if the starting value is reasonably close to the maximum<br />

likelihood estimator.<br />

Remark 1.2 (Fisher Scoring) Noting that =−EH, an alternative procedure is<br />

to replace minus the Hessian by its expected value, i.e. minus the Fisher information matrix.<br />

The resulting procedure takes as an improved estimate<br />

̂r+1 ≈ ̂<br />

)<br />

r −1<br />

+ <br />

(̂r U<br />

(̂r)<br />

and is known as Fisher Scoring.


Mixed Poisson Models for <strong>Claim</strong> Numbers 41<br />

1.5.4 Hypothesis Tests<br />

Sample Distribution <strong>of</strong> Individual Parameters<br />

Standard hypothesis tests about parameters in maximum likelihood models are handled quite<br />

easily, thanks to the asymptotic Normal distribution <strong>of</strong> the maximum likelihood estimator.<br />

Specifically, we use the fact that<br />

̂j − j<br />

̂j<br />

is approximately or0 1<br />

where the standard deviation ̂j<br />

<strong>of</strong> ̂ j is the square root <strong>of</strong> the jth diagonal element <strong>of</strong><br />

̂<br />

= −1 <br />

Such tests will be useful in Chapter 2 to select the relevant risk factors.<br />

In practice, <strong>of</strong>ten involves unknown parameters so that it is estimated by the jth<br />

̂j<br />

element <strong>of</strong> ̂̂j<br />

̂̂<br />

= ̂ −1 <br />

In such a case, ̂ j − j is approximately Student’s distributed with n − 1 degrees<br />

/̂̂j<br />

<strong>of</strong> freedom. This is the familiar z-score for a standard Normal variable developed in all<br />

introductory statistics classes. The Normality <strong>of</strong> maximum likelihood estimates means that<br />

our testing <strong>of</strong> hypotheses about the parameters is as simple as calculating the z-score and<br />

finding the associated p-value from a table or by calling a s<strong>of</strong>tware function.<br />

The hypothesis test is based on Student’s t-distribution. However, because the maximum<br />

likelihood properties are all asymptotic we are unable to address the finite sample distribution.<br />

Asymptotically, the Student’s t-distribution converges to the Normal as the degrees for<br />

freedom grow, so that using or0 1 p-values in the maximum likelihood test is the same<br />

as the t-test as long as the number <strong>of</strong> cases is large enough. Specifically,<br />

̂j − j<br />

̂̂j<br />

is approximately or0 1 (1.52)<br />

if the sample size n is large enough.<br />

In addition to the test <strong>of</strong> hypotheses about a single parameter, there are three classical<br />

tests that encompass hypotheses about sets <strong>of</strong> parameters as well as one parameter at a time:<br />

the likelihood ratio, Wald, and Lagrange multiplier tests. All are asymptotically equivalent,<br />

but they differ in the ease <strong>of</strong> implementation depending on the particular case. Here, we will<br />

present Wald, likelihood ratio and Vuong tests, as well as the Score test.<br />

Likelihood Ratio Test<br />

This test is based on a comparison <strong>of</strong> maximized likelihoods for nested models. Specifically,<br />

the null hypothesis H 0 corresponds to a constrained model with dim − j parameters,<br />

whereas the alternative H 1 corresponds to the full model with dim parameters. Most <strong>of</strong>


42 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

the time, the test is performed with j = 1, so that we compare the full model to a simpler<br />

one with one parameter less.<br />

Let ˜ be the maximum likelihood estimator under H 0 , and let ̂ be the maximum likelihood<br />

estimator under H 1 . The likelihood ratio test is based on the ratio <strong>of</strong> the likelihoods between<br />

a full and a restricted (or reduced) nested model with fewer parameters. The restricted model<br />

must be nested within (i.e., be a subset <strong>of</strong>) the full model. The likelihood ratio test statistic is<br />

T = 2ln ̂ (<br />

)<br />

˜ = 2 L̂ − L˜ <br />

The evidence against H 0 will be strong when T is large.<br />

The Chi-square distribution plays a prominent role in likelihood ratio tests. Recall that the<br />

Gamma distribution with = /2 and = 1/2 for some positive integer is known as the<br />

Chi-square distribution with degrees <strong>of</strong> freedom (which is denoted as 2 ), with associated<br />

probability density function<br />

fx = x/2−1 exp−x/2<br />

( ) <br />

<br />

2 2<br />

/2<br />

x>0<br />

If X ∼ 2 then its mean is , and its variance 2. It is useful to recall that the 2 distribution<br />

is closely related to the Normal distribution. Specifically, the 2 arises as the distribution <strong>of</strong><br />

the sum <strong>of</strong> independent squared or0 1 random variables.<br />

Under H 0 , the test statistic T is approximatively Chi-square distributed with degrees<br />

<strong>of</strong> freedom equal to the number <strong>of</strong> parameters in the full model minus the number <strong>of</strong><br />

parameters in the restricted model (that is, with j degrees <strong>of</strong> freedom) when the sample size<br />

n is sufficiently large (and additional mild regularity conditions are fulfilled). Note that the<br />

likelihood ratio test requires us to perform two maximum likelihood estimations, one under<br />

H 0 and another one under H 1 . When the largest model H 1 is misspecified (that is, the data<br />

have not been generated by this probability model), the likelihood ratio statistic is no longer<br />

necessarily Chi-square distributed under H 0 .<br />

Unfortunately, there are cases where regularity conditions do not hold for T to be<br />

approximately j<br />

2 distributed under H 0 . In particular this happens when a constrained<br />

parameter is on the boundary <strong>of</strong> the parameter space, e.g., testing Poisson versus Negative<br />

Binomial. Here Poisson is a particular case <strong>of</strong> Negative Binomial when the latter has a<br />

parameter on its boundary space. In this case, the limiting distribution <strong>of</strong> the statistic T<br />

becomes a mixture <strong>of</strong> Chi-square distributions. We refer the reader to Titterington ET AL.<br />

(1985) for more details about these situations.<br />

Wald Tests<br />

The Wald test provides an alternative to the likelihood ratio test that requires the estimation<br />

<strong>of</strong> only the full model, not the restricted model. The logic <strong>of</strong> the Wald test is that if the<br />

restrictions are correct then the unrestricted parameter estimates should be close to the value<br />

hypothesized under the restricted model.<br />

The Wald test is based on the distribution <strong>of</strong> a quadratic form <strong>of</strong> the weighted sum <strong>of</strong><br />

squared Normal deviates, a form that is known to be Chi-square distributed. Specifically,<br />

using (1.49), we can test H 0 = 0 versus H 1 ≠ 0 with the statistic


Mixed Poisson Models for <strong>Claim</strong> Numbers 43<br />

W = ̂ − 0 ̂̂ − 0 <br />

which is approximately 2 dim distributed under H 0, in large samples. The test statistic can<br />

be interpreted as a measure <strong>of</strong> the distance between the maximum likelihood estimator ̂<br />

and the hypothesized value 0 . The Wald test leads to the rejection <strong>of</strong> H 0 in favor <strong>of</strong> H 1 if<br />

̂ is too far from 0 . Note that the Wald test suffers from the same problems as likelihood<br />

ratio tests when 0 lies on the boundary <strong>of</strong> the parametric space.<br />

Sometimes the calculation <strong>of</strong> the expected information is difficult, and we may use the<br />

observed information instead.<br />

Score Tests<br />

Using the asymptotic theory, we have that U is approximately or0 distributed.<br />

Therefore we can test H 0 = 0 versus H 1 ≠ 0 with the statistic<br />

Q = U 0 I −1 0 U 0 <br />

which is approximately 2 dim distributed under H 0, in large samples.<br />

The advantage <strong>of</strong> the score test is that the calculation <strong>of</strong> the maximum likelihood estimator<br />

̂ is bypassed. Moreover, it remains applicable even if 0 lies on the boundary <strong>of</strong> the<br />

parametric space.<br />

Vuong Test<br />

The Chi-square approximation to the distribution <strong>of</strong> the likelihood ratio test statistic is valid<br />

only for testing restrictions on the parameters <strong>of</strong> a statistical model (i.e., H 0 and H 1 are<br />

nested hypotheses). With non-nested models, we cannot make use <strong>of</strong> likelihood ratio tests for<br />

model comparison. In this case, information criteria like AIC or (S)BIC are useful, as well<br />

as the Vuong test for non-nested models. Recall that the AIC (Akaike Information Criteria),<br />

is given by<br />

AIC =−2L̂ + 2 dim<br />

and the BIC (Bayesian Information Criteria), is given by<br />

BIC =−2L̂ + lnn dim<br />

Both criteria are equal to minus two times the maximum log-likelihood, penalized by a<br />

function <strong>of</strong> the number <strong>of</strong> observations and sample size.<br />

Vuong (1989) proposed a likelihood ratio-based statistic for testing the null hypothesis<br />

that the competing models are equally close to the true data generating process against the<br />

alternative that one model is closer. Consider two statistical models given by the probability<br />

mass functions p· and q· with dim = dim, and define the likelihood ratio<br />

statistic for the model p· against q· as<br />

k∑<br />

max<br />

LR̂ n ̂ n = f k ln pk̂ n <br />

qk̂ n <br />

k=0


44 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

where ̂ n and ̂ n are the maximum likelihood estimators in each model based on the sample<br />

k 1 k n and f k is defined as in (1.46).<br />

If both models are strictly non-nested (so that standard likelihood ratio tests do not apply)<br />

then under H 0<br />

LR̂ n ̂ n <br />

̂ n<br />

√ n<br />

is approximately or0 1 distributed<br />

where<br />

̂ n = 1 n<br />

(<br />

n∑<br />

i=1<br />

ln pk î n <br />

qk i ̂ n <br />

) 2<br />

−<br />

(<br />

1<br />

n<br />

n∑<br />

i=1<br />

)<br />

ln pk 2<br />

î n <br />

<br />

qk i ̂ n <br />

This provides a very simple test for model selection. Specifically, the actuary chooses a<br />

critical value z from the or0 1 distribution for some significance level . If the value<br />

<strong>of</strong> the test statistic is higher than z then he rejects the null hypothesis that the models are<br />

equivalent in favour <strong>of</strong> p· being better than q·. If the test statistic is smaller than −z <br />

then he rejects the null hypothesis in favour <strong>of</strong> q· being better than p·. Finally, if the<br />

test statistic is between −z and z then we cannot discriminate between the two competing<br />

models given the data.<br />

The test statistic can be adjusted if the competing models do not have the same number<br />

<strong>of</strong> parameters, i.e. dim ≠ dim (which is not the case in this chapter).<br />

1.6 Numerical Illustration<br />

Here, we consider a Belgian motor third party liability insurance portfolio observed during<br />

the year 1997 (henceforth referred to as Portfolio A). The observed claim distribution is<br />

given in Table 1.1. A thorough description <strong>of</strong> this portfolio is deferred to Section 2.2.<br />

We see from Table 1.1 that the total exposure is not equal to the number <strong>of</strong> policies due<br />

to the fact that some policies have not been in force during the full observation period (12<br />

months). Some <strong>of</strong> them have been cancelled before the end <strong>of</strong> the observation period. Others<br />

have been written after the start <strong>of</strong> the observation period.<br />

Let us now fit the observations to the Poisson, the Negative Binomial, the Poisson-Inverse<br />

Gaussian and the Poisson-LogNormal distributions. The results are summarized below:<br />

Table 1.1 Observed claim distribution in Portfolio A.<br />

Number <strong>of</strong> claims Number <strong>of</strong> policies Total exposure (in years)<br />

0 12962 10 54594<br />

1 1369 1 18713<br />

2 157 13466<br />

3 14 1108<br />

4 3 252<br />

Total 14505 11 88135


Mixed Poisson Models for <strong>Claim</strong> Numbers 45<br />

Poisson the maximum likelihood estimate <strong>of</strong> the Poisson mean is ̂ = 01462. The 95 %<br />

confidence interval for is (0.1395;0.1532). The log-likelihood <strong>of</strong> the Poisson model is<br />

−5579.339.<br />

Negative Binomial the maximum likelihood estimate <strong>of</strong> the mean is ̂ = 01474 and<br />

the dispersion parameter â = 0889. The variance <strong>of</strong> the random effect is estimated as<br />

̂V = 1/â = 11253. The respective 95 % confidence intervals are (0.1402;0.1551) for<br />

and (0.8144;1.4361) for V. The log-likelihood <strong>of</strong> the Negative Binomial model is<br />

-5534.36, which is better than the Poisson log-likelihood.<br />

Poisson-Inverse Gaussian the maximum likelihood estimation <strong>of</strong> the mean is ̂ = 01475,<br />

and the variance <strong>of</strong> the random effect is estimated to ̂V = ̂ = 11770. The respective<br />

95 % confidence intervals are (0.1402;0.1552) for and (0.8258;1.5282) for V. The<br />

log-likelihood <strong>of</strong> the Poisson-Inverse Gaussian model is −553428, which is better than the<br />

Poisson log-likelihood and almost equivalent to the Negative Binomial log-likelihood.<br />

Poisson-LogNormal the maximum likelihood estimation <strong>of</strong> the mean is ̂ = 01476, and<br />

̂ 2 = 07964. The variance <strong>of</strong> the random effect is estimated to ̂V = 12175. The respective<br />

95 % confidence intervals are (0.1403;0.1553) for and (0.6170;0.9758) for 2 . The loglikelihood<br />

<strong>of</strong> the Poisson-LogNormal model is −553444, which is better than the Poisson<br />

log-likelihood and almost equivalent to the Negative Binomial and Poisson-Inverse Gaussian<br />

log-likelihoods.<br />

The results have been obtained with the help <strong>of</strong> the SAS R procedure GENMOD for the<br />

Poisson and Negative Binomial distributions (details will be given in the next chapter) and<br />

by a direct maximization <strong>of</strong> the log-likelihood using the Newton–Raphson procedure (coded<br />

in the SAS R environment IML) in the Poisson-Inverse Gaussian and Poisson-LogNormal<br />

cases.<br />

It is interesting to note that the values <strong>of</strong> ̂ are different in the Poisson and mixed Poisson<br />

models. If all the risk exposures were equal then these values would have been the same in<br />

all cases.<br />

Let us now compare the Poisson fit to Portfolio A with each <strong>of</strong> the mixed Poisson fits.<br />

To this end, we use a likelihood ratio test, with an adjusted Chi-square approximation (since<br />

the Poisson case is at the border <strong>of</strong> the mixed Poisson family). Comparing the Poisson fit to<br />

any <strong>of</strong> the three mixed Poisson models leads to a clear rejection <strong>of</strong> the former one:<br />

Poisson against Negative Binomial<br />

less than 10 −10 .<br />

likelihood ratio test statistic <strong>of</strong> 89.95, with a p-value<br />

Poisson against Poisson-Inverse Gaussian<br />

p-value less than 10 −10 .<br />

likelihood ratio test statistic <strong>of</strong> 90.12, with a<br />

Poisson against Poisson-LogNormal likelihood ratio test statistic <strong>of</strong> 89.80, with a p-value<br />

less than 10 −10 .<br />

The rejection <strong>of</strong> the Poisson assumption in favour <strong>of</strong> a mixed Poisson model is interpreted<br />

as a sign that the portfolio is composed <strong>of</strong> different types <strong>of</strong> drivers (i.e. the portfolio is<br />

heterogeneous).<br />

Now, comparing the three mixed Poisson models with the Vuong test gives:


46 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Negative Binomial against Poisson-Inverse Gaussian Vuong test statistic equal to -0.1086,<br />

with p-value 91.36 %.<br />

Poisson-LogNormal against Negative Binomial<br />

with p-value 96.54 %.<br />

Vuong test statistic equal to −00435,<br />

Poisson-LogNormal against Poisson-Inverse Gaussian Vuong test statistic equal to<br />

−03254, with p-value 74.48 %.<br />

We cannot discriminate between the three competing models given the data, and they all<br />

fit the model equally well.<br />

Remark 1.3 (Chi-Square Goodness-<strong>of</strong>-Fit Tests) In many papers appearing in the<br />

actuarial literature devoted to the analysis <strong>of</strong> claim numbers, as well as in most empirical<br />

studies, Chi-square goodness-<strong>of</strong>-fit tests are performed to select the optimal model. However,<br />

this approach neglects the exposures-to-risk (acting as if all the policies were in the portfolio<br />

for the whole year). We do not rely on Chi-square goodness-<strong>of</strong>-fit tests here since they do<br />

not allow for unequal risk exposures. Note that the vast majority <strong>of</strong> papers appearing in the<br />

actuarial literature disregard risk exposures (and proceed as if all the risk exposures were<br />

equal to 1).<br />

1.7 Further Reading and Bibliographic Notes<br />

1.7.1 Mixed Poisson Distributions<br />

Mixed Poisson distributions are <strong>of</strong>ten used to model insurance claim numbers. The statistical<br />

analysis <strong>of</strong> counting random variables is described in much detail in Johnson ET AL. (1992).<br />

An excellent introduction to statistical inference is provided by Franklin (2005). In the<br />

actuarial literature, Klugman ET AL. (2004) provide a good account <strong>of</strong> statistical inference<br />

applied to insurance data sets, and in particular the analysis <strong>of</strong> counting random variables.<br />

Generating functions are described in Kendall & Stuart (1977) and Feller (1971).<br />

The axiomatic approach for which the (mixed) Poisson distribution is the counting<br />

distribution for a (mixed) Poisson process is presented in Grandell (1997). Mixture models<br />

are discussed in Lindsay (1995). See also Titterington ET AL. (1985). Let us mention the<br />

work by Karlis (2005), who applied the EM algorithm for maximum likelihood estimation<br />

in mixed Poisson models.<br />

1.7.2 Survey <strong>of</strong> Empirical Studies Devoted to <strong>Claim</strong> Frequencies<br />

Kestemont & Paris (1985), using mixtures <strong>of</strong> Poisson processes, defined a large class <strong>of</strong><br />

probability distributions and developed an efficient method for estimating their parameters.<br />

For the six data sets in Gossiaux & Lemaire (1981), they proposed a law depending on<br />

three parameters and they always obtained extremely good fits. As particular cases <strong>of</strong> the<br />

laws introduced in Kestemont & Paris (1985), we find the ordinary Poisson distribution,<br />

the Poisson-Inverse Gaussian distribution, and the Negative Binomial distribution.<br />

Tremblay (1992) used the Poisson-Inverse Gaussian distribution. Willmot (1987)<br />

compared the Poisson-Inverse Gaussian distribution to the Negative Binomial one and<br />

concluded that the fits are superior with the Poisson-Inverse Gaussian in all the six cases


Mixed Poisson Models for <strong>Claim</strong> Numbers 47<br />

studied by Gossiaux & Lemaire (1981). See also the paper by Besson & Partrat (1990).<br />

Ruohonen (1987) considered a model for the claim number process. This model is a mixed<br />

Poisson process with a three-parameter Gamma distribution as the structure function and is<br />

compared with the two-parameter Gamma model giving the Negative Binomial distribution.<br />

He fitted his model to some data that can be found in the actuarial literature and the results<br />

were satisfying. Panjer (1987) proposed the Generalized Poisson-Pascal distribution (in fact,<br />

the H<strong>of</strong>mann distribution), which includes three parameters, for the modelling <strong>of</strong> the number<br />

<strong>of</strong> automobile claims. The fits obtained were satisfactory, too. Note that the Pólya-Aeppli, the<br />

Poisson-Inverse Gaussian and the Negative Binomial are special cases <strong>of</strong> this distribution.<br />

Consul (1990) tried to fit the same six data sets by the Generalized Poisson distribution.<br />

Although the Generalized Poisson law is not rejected by a Chi-square test, the fits obtained<br />

by Kestemont & Paris (1985), for instance, are always better. Furthermore, Elvers (1991)<br />

reported that the Generalized Poisson distribution did not fit the data observed in a motor<br />

third party liability insurance portfolio very well. Islam & Consul (1992) suggested the<br />

Consul distribution as a probabilistic model for the distribution <strong>of</strong> the number <strong>of</strong> claims in<br />

automobile insurance. These authors approximated the chance mechanism which produces<br />

vehicle accidents by a branching process. They fit the model to the data sets used by Panjer<br />

(1987) and by Gossiaux & Lemaire (1981). Note that this model deals only with cars in<br />

accidents. Consequently, the zero-class has to be excluded. The fitted values seem good.<br />

However, this has to be considered cautiously, due to the comments by Sharif & Panjer<br />

(1993) who found serious flaws embedded in the fitting <strong>of</strong> the Consul model. In particular,<br />

the very restricted parameter space and some theoretical problems in the derivation <strong>of</strong> the<br />

maximum likelihood estimators. They refer to other simple probability models, such as the<br />

Generalized Poisson-Pascal or the Poisson-Inverse Gaussian, whose fits were found quite<br />

satisfying.<br />

Denuit (1997) demonstrated that the Poisson-Goncharov distribution introduced by<br />

Lefèvre & Picard (1996) provides an appropriate probability model to describe the annual<br />

number <strong>of</strong> claims incurred by an insured motorist. Estimation methods were proposed, and<br />

the Poisson-Goncharov distribution successfully fitted the six observed claims distributions<br />

in Gossiaux & Lemaire (1981), as well as other insurance data sets.<br />

1.7.3 Semiparametric Approach<br />

Traditionally, actuaries have assumed that the distribution <strong>of</strong> values among all drivers<br />

is well approximated by a parametric distribution, be it Gamma, Inverse Gaussian or<br />

LogNormal. However, there is no particular reason to believe that F belongs to some<br />

specified parametric family <strong>of</strong> distributions. Therefore, it seems interesting to resort to a<br />

nonparametric estimator for F .<br />

There have been several attempts to estimate the structure function in a mixed Poisson<br />

model nonparametrically. Most <strong>of</strong> them include the annual claim frequency in the random<br />

effect, and thus work with ˜ = . Assuming that ˜ has a finite number <strong>of</strong> support points<br />

and that its probability distribution is uniquely determined by its moments, Tucker (1963)<br />

suitably precised by Lindsay (1989a,b) suggested estimation <strong>of</strong> the support points <strong>of</strong> ˜ and<br />

the corresponding probability masses by solving a moment system. This estimator was then<br />

smoothed by Carrière (1993b) using a mixture <strong>of</strong> LogNormal distribution functions where<br />

all the parameters are estimated by a method <strong>of</strong> moments.


48 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

In a seminal paper, Simar (1976) gave a detailed description <strong>of</strong> the nonparametric<br />

maximum likelihood estimator <strong>of</strong> F˜,<br />

as well as an algorithm for its computation. The<br />

nonparametric maximum likelihood estimator has a discrete distribution, and Simar (1976)<br />

obtained an upper bound for the size <strong>of</strong> its support.<br />

Walhin & Paris (1999) showed that, although the nonparametric maximum likelihood<br />

estimator is powerful for evaluation <strong>of</strong> functionals <strong>of</strong> claim counts, it is not suitable for<br />

ratemaking, because it is purely discrete. For this reason, Denuit & Lambert (2001)<br />

proposed a smoothed version <strong>of</strong> the nonparametric maximum likelihood estimator. This<br />

approach is somewhat similar to the route followed by Carrière (1993b), who proposed to<br />

smooth the Tucker-Lindsay moment estimator with a LogNormal kernel.<br />

Young (1997) applied nonparametric density estimation techniques to estimate F˜.<br />

Because the actuary only observes claim numbers and not the conditional mean, an estimation<br />

<strong>of</strong> the underlying risk parameter relating to the ith policy <strong>of</strong> the portfolio is the average claim<br />

number x i (i.e. the total number <strong>of</strong> claims generated by this policy divided by the length<br />

<strong>of</strong> the exposure period). Therefore, given a kernel K, Young (1997) suggested estimating<br />

dF by<br />

d̂F˜t =<br />

n∑<br />

i=1<br />

( )<br />

w i t − xi<br />

K<br />

h i h i<br />

in which h i is a positive parameter called the bandwidth and w i is a weight (taken to be the<br />

number <strong>of</strong> years the ith policy is in force divided by the total number <strong>of</strong> policy-years for the<br />

collective). Young (1997) suggested using the Epanechnikov kernel and determined the h i s<br />

in order to minimize the mean integrated squared error (by reference to a Normal prior).


2<br />

<strong>Risk</strong> <strong>Classification</strong><br />

2.1 Introduction<br />

2.1.1 <strong>Risk</strong> <strong>Classification</strong>, Regression Models and Random Effects<br />

Motor ratemaking is essentially about classifying policies according to their risk<br />

characteristics. The classification variables are called a priori variables (as their values can<br />

be determined before the policyholder starts to drive). In motor insurance, they include the<br />

age, gender and occupation <strong>of</strong> the policyholders, the type and use <strong>of</strong> their car, the place<br />

where they reside and sometimes even the number <strong>of</strong> cars in the household, marital status,<br />

or the colour <strong>of</strong> the vehicle.<br />

These observable risk characteristics are typically seen as nonrandom covariates. Other risk<br />

characteristics are unobservable and must be seen as unknown parameters or, in the vein <strong>of</strong><br />

credibility theory, latent variables with a common distribution. The literature about premium<br />

rating in motor insurance comprises two mainstream approaches: (i) the first one disregards<br />

observable covariates altogether and lumps all the individual characteristics into random<br />

latent variables and (ii) the second one disregards random individual risk characteristics and<br />

tries instead to catch all relevant individual variations by covariates. Chapter 1 adopted the<br />

first approach. The present chapter combines both views, employing contemporary, advanced<br />

data analysis.<br />

If the data are subdivided into risk classes determined by a priori variables, actuaries work<br />

with figures which are small in exposure and claim numbers. Therefore, simple averages will<br />

be suspect and regression models are needed. Regression analyses the relationship between<br />

one variable and another set <strong>of</strong> variables. This relationship is expressed as an equation that<br />

predicts a response variable (the expected number <strong>of</strong> claims filed by a given policyholder)<br />

from a function <strong>of</strong> explanatory variables and parameters (involving a linear combination<br />

<strong>of</strong> these explanatory variables and parameters, called a linear predictor). The parameters<br />

are estimated so that a measure <strong>of</strong> the goodness-<strong>of</strong>-fit is optimized (the log-likelihood, in<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


50 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

most cases). Actuaries use regression techniques to predict the expected number <strong>of</strong> claims<br />

knowing some information about the policyholders, vehicles and types <strong>of</strong> contract. It is worth<br />

mentioning that even with all the covariates included here, there still remain substantial risk<br />

differentials between individual drivers (due to hidden characteristics, like temper and skill,<br />

aggressiveness behind the wheel, knowledge <strong>of</strong> the highway code, etc.). Random effects<br />

are added to the linear predictor on the score scale to take this residual heterogeneity into<br />

account, reconciling the two approaches (i)–(ii) mentioned above.<br />

In nonlife business, the pure premium is the expected cost <strong>of</strong> all the claims that<br />

policyholders will file during the coverage period (under the assumption <strong>of</strong> the Law <strong>of</strong> Large<br />

Numbers). The computation <strong>of</strong> this premium relies on a statistical model incorporating all<br />

the available information about the risk. The technical tariff aims to evaluate as accurately as<br />

possible the pure premium for each policyholder via regression techniques. It is well-known<br />

that market premiums may differ from those computed by actuaries; see, e.g., Coutts (1984)<br />

for a discussion. In that respect, the overall market position <strong>of</strong> the company compared to its<br />

competitors with regard to growth and pricing is crucial.<br />

Sometimes, motor ratemaking is performed on panel data. Bringing several observation<br />

periods together has some advantages: it increases the sample size and avoids granting too<br />

much importance to a single calendar year (during which the particular weather conditions<br />

could have increased or decreased the number <strong>of</strong> traffic accidents, for instance). However,<br />

this induces some dependence in the data, since observations relating to the same policyholder<br />

across time are expected to be correlated. The analysis <strong>of</strong> correlated data with Poisson<br />

marginals arising from repeated measurements can be performed with the help <strong>of</strong> Generalized<br />

Estimating Equations (GEEs). GEEs provide a practical method with reasonable statistical<br />

efficiency to analyse such panel data. GEEs also give initial values for maximum likelihood<br />

procedures in models for longitudinal data.<br />

2.1.2 <strong>Risk</strong> Sharing in Segmented Tariffs<br />

The following discussion is inspired by the paper by De Wit & Van Eeghen (1984).<br />

Consider a portfolio <strong>of</strong> n policies from motor third party liability insurance. The random<br />

variable Y i models a quantity <strong>of</strong> actuarial interest for policy i (for instance the amount <strong>of</strong> a<br />

claim, the aggregate claim amount in one period or the number <strong>of</strong> accidents at fault reported<br />

by policyholder i during one period). In order to explain the outcomes <strong>of</strong> Y i , the actuary has<br />

observable covariates Xi<br />

T = X i1 X i2 at his disposal (e.g., age, gender and occupation<br />

<strong>of</strong> policyholder i, the place where he resides, type and use <strong>of</strong> his car). However, Y i also<br />

depends on a sequence <strong>of</strong> unknown characteristics Zi<br />

T = Z i1 Z i2 (e.g., annual mileage,<br />

accuracy <strong>of</strong> judgment, aggressiveness behind the wheel, drinking behaviour, etc.). Some <strong>of</strong><br />

these quantities are unobservable, others cannot be measured in a cost efficient way.<br />

The ‘true’ premium for policyholder i is EY i X i Z i . It is the function g <strong>of</strong> X i and Z i<br />

that is the ‘closest’ to Y i , in the sense that EY i − gX i Z i 2 is minimum for gX i Z i =<br />

EY i X i Z i . If the insurer charges EY i X i Z i to policyholder i, then the policyholders pay<br />

premiums that absorb the inter-individual variations (that is, the variations <strong>of</strong> the premiums<br />

due to the modifications in personal characteristics X i and Z i , the magnitude <strong>of</strong> which are<br />

quantified by V [ EY i X i Z i ] ). The company covers the purely random intra-individual risks<br />

(that is, the random fluctuations <strong>of</strong> Y i , which are quantified by the variance E [ VY i X i Z i ]<br />

<strong>of</strong> the outcomes <strong>of</strong> Y i once the personal characteristics X i and Z i have been fixed). <strong>Risk</strong>


<strong>Risk</strong> <strong>Classification</strong> 51<br />

sharing can be summarized as follows: using the variance decomposition formula and then<br />

taking expectations gives:<br />

VY i = E [ VY i X i Z i ]<br />

+ V [ EY i X i Z i ]<br />

<br />

} {{ } } {{ }<br />

→insurer<br />

→policyholder<br />

Of course, since the elements <strong>of</strong> Z i are unknown to the insurer, the situation described<br />

above is purely theoretical. Since the company only knows X i , the insurer can only charge<br />

EY i X i . The risk sharing is now<br />

VY i = E [ VY i X i ]<br />

+ V [ EY i X i ]<br />

<br />

} {{ } } {{ }<br />

→insurer →policyholder<br />

The part <strong>of</strong> the variance supported by the insurer is now larger, since residual heterogeneity<br />

remains with the company. To see this, let us write<br />

[ ] [ ] [ [<br />

∣<br />

E VY i X i = E VY i X i Z i + E V EY i X i Z i ∣X i<br />

]]<br />

The first term in this sum, i.e. E [ VY i X i Z i ] , represents the purely random fluctuations <strong>of</strong><br />

the risk and is supported by the insurance company in application <strong>of</strong> the very basic principle<br />

<strong>of</strong> insurance. On the contrary, the second term represents the variations <strong>of</strong> the expected<br />

claims due to the unknown risk characteristics Z i . This quantity should be corrected by an<br />

experience rating mechanism (as discussed in Chapters 3 and 4).<br />

We can now clearly see the link existing between a priori and a posteriori ratemaking. The<br />

idea behind experience rating is that past claims experience reveals the hidden features Z i .<br />

Let Yi<br />

← denote the past claims experience available about Y i . The idea is that the information<br />

contained in X i Yi<br />

← becomes comparable to X i Z i as time goes on. Therefore, the a<br />

posteriori premium is EY i X i Yi<br />

← . Experience rating is based on a ‘crime and punishment’<br />

mechanism: claim-free policyholders are rewarded by premium discounts called bonuses,<br />

whereas policyholders reporting one or more accidents at fault are penalized by premium<br />

surcharges called maluses.<br />

In a priori ratemaking, the actuary aims to identify the best predictors X i and to compute<br />

the risk premium EY i X i .Ina posteriori ratemaking, the actuary purposes to compute<br />

premium corrections according to past claims history Yi<br />

← in order to reflect the unavailable<br />

information contained in Z i . A posteriori ratemaking techniques are discussed in the next<br />

chapter.<br />

2.1.3 Bonus Hunger and Censoring<br />

In this chapter, we fit statistical models for the number <strong>of</strong> claims subject to the existing<br />

rules adopted by the insurance company to penalize reported claims. Due to bonus-malus<br />

mechanisms, some claims are not filed to the company, because policyholders think it is<br />

cheaper for them to defray the third party (or to pay for their own costs in first party<br />

coverages) to avoid premium surcharges. The data are thus censored in a complicated<br />

way, and the conclusions <strong>of</strong> the actuarial analysis are valid only if the existing rules


52 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

are kept unchanged. Analysing insurance data, the actuary is able to draw conclusions<br />

about the number <strong>of</strong> claims filed by policyholders subject to a specific a posteriori<br />

ratemaking mechanism. The actuary is not able to draw any conclusions about the number<br />

<strong>of</strong> accidents caused by these insured drivers. We will come back to this important issue in<br />

Chapter 5.<br />

2.1.4 Agenda<br />

This chapter is devoted to a priori ratemaking, and focusses on claim frequencies. To make<br />

the discussion more concrete, we analyse the statistics from a couple <strong>of</strong> motor insurance<br />

portfolios. We first work with cross-sectional data (i.e. data gathered during one year <strong>of</strong><br />

observation) from a Belgian motor third party liability insurance port<strong>of</strong>olio observed during<br />

the year 1997 (called Portfolio A, already used in Chapter 1). We also show how it is<br />

possible to build a ratemaking on the basis <strong>of</strong> panel data. To this end, we use another Belgian<br />

portfolio (called Portfolio B) for which the data have been collected during 3 years (from<br />

1997 to 1999).<br />

To fix the ideas, in Section 2.2 we present the data observed during the year 1997 and the<br />

different explanatory variables available for Portfolio A. We give a detailed description <strong>of</strong><br />

all the variables and we have a first look at their influence on the risk borne by the insurer.<br />

Then, in Section 2.3, we show how it is possible to build an a priori ratemaking thanks to<br />

a Poisson regression. We illustrate the technique on the data from Portfolio A. Section 2.4<br />

addresses the problem <strong>of</strong> overdispersion. A random effect is added to the covariates to<br />

account for residual heterogeneity (vector Z i in the preceding discussion). We examine<br />

three classical models: the Poisson-Gamma (or Negative Binomial) model, the Poisson-<br />

Inverse Gaussian model and the Poisson-LogNormal model. These are extensions <strong>of</strong> the<br />

corresponding models presented in Chapter 1, to incorporate exogenous information about<br />

policyholders.<br />

In Section 2.9, we develop ratemaking techniques using panel data. Portfolio B that has<br />

been observed during three consecutive years is used for the numerical illustrations. GEE and<br />

maximum likelihood are used to estimate the parameters involved in models for longitudinal<br />

data. The final Section 2.10 <strong>of</strong>fers an extensive discussion <strong>of</strong> topics not covered in this<br />

chapter, together with appropriate references.<br />

2.2 Descriptive Statistics for Portfolio A<br />

2.2.1 Global Figures<br />

The data relate to a Belgian motor third party liability insurance portfolio observed during<br />

the year 1997. The data set (henceforth referred to as Portfolio A, for brevity) comprises<br />

14 505 policies. The observed claim number distribution in the portfolio has been described<br />

in Table 1.1. The observed mean claim frequency for Portfolio A is 14.6 %.<br />

2.2.2 Available Information<br />

The following information is available on an individual basis: in addition to the number <strong>of</strong><br />

claims filed by each policyholder (variable Nclaim) and the exposure-to-risk from which


<strong>Risk</strong> <strong>Classification</strong> 53<br />

these claims originate (i.e. the number <strong>of</strong> days the policy has been in force during 1997,<br />

variable Expo), we know<br />

Age : Policyholder’s age (four categories: 1 = ‘between 18 and 24’, 2 = ‘between 25 and<br />

30’, 3 = ‘between 31 and 60’, 4 = ‘more than 60’)<br />

Gender : Policyholder’s gender (two categories: 1 = ‘woman’, 2 = ‘man’)<br />

District : kind <strong>of</strong> district where the policyholder lives (two categories: 1 = ‘urban’, 2 =<br />

‘rural’)<br />

Use : Use <strong>of</strong> the car (two categories: 1 = ‘private use, i.e. leisure and commuting’, and 2 =<br />

‘pr<strong>of</strong>essional use’)<br />

Split : premium split (two categories: 1 = ‘premium paid once a year’ and 2 = ‘premium<br />

split up’).<br />

In practice, insurers have at their disposal much more information about their<br />

policyholders. Here, we focus on these few explanatory variables for pedagogical purposes,<br />

to ease the exposition <strong>of</strong> ideas.<br />

We see that all the explanatory variables listed above are categorical, i.e. they can be<br />

used to partition the portfolio into homogeneous classes with respect to the variables. Such<br />

explanatory variables are called factors, each factor having a number <strong>of</strong> levels. In practice,<br />

there are also continuous covariates. We explain in the last section <strong>of</strong> this chapter how to<br />

deal with such explanatory variables.<br />

2.2.3 Exposure-to-<strong>Risk</strong><br />

The majority <strong>of</strong> policies are in force during the whole year. However, in some cases, the<br />

observation period does not last for the entire year. This is the case, for instance, for new<br />

policyholders entering the portfolio during the observation period, and in case <strong>of</strong> policy<br />

cancellations. It is also common in practice to start a new period if some changes occur<br />

in the observable characteristics <strong>of</strong> the policies (for instance, the policyholder moves from<br />

a rural to an urban area and the company uses the rating variable District). The policy<br />

is then represented as two different lines in the data base, and observations are recorded<br />

separately for the two periods (the policy number allows the actuary to track these changes).<br />

Note that independence is lost in this case, and allowance for panel data is preferable (as<br />

in Section 2.9). This variety <strong>of</strong> situations is taken into account in the Poisson process, by<br />

multiplying the annual expected claim frequency by the length <strong>of</strong> the observation period, as<br />

explained in Chapter 1.<br />

In Portfolio A, the average coverage period is 298.98 days. Figure 2.1 gives an idea <strong>of</strong> the<br />

distribution <strong>of</strong> the exposure-to-risk in the portfolio. About 65 % <strong>of</strong> the policies have been<br />

observed during the whole year 1997. Considering the distribution <strong>of</strong> the exposure-to-risk,<br />

we see that policy issuances and lapses are randomly spread over the year. The distribution<br />

<strong>of</strong> the policies in force during less than one year is roughly uniform over [0,365].<br />

It is worth mentioning that the policies that are just issued <strong>of</strong>ten differ from those in the<br />

portfolio. This is why it may be preferable to conduct a separate analysis for this type <strong>of</strong><br />

policy.


54 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

10000<br />

9000<br />

8000<br />

Number <strong>of</strong> policyholders<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

1 2 3 4 5 6 7 8 9 10 11 12<br />

Months<br />

Figure 2.1 Exposure-to-risk in Portfolio A.<br />

2.2.4 One-Way Analyses<br />

Age<br />

The age structure <strong>of</strong> the portfolio is described in Figure 2.2. Most policyholders are middleaged<br />

as 6722 insured drivers (representing 46.4 % <strong>of</strong> the portfolio) are between 31 and 60.<br />

Only 802 insured drivers (representing 5.8 % <strong>of</strong> the portfolio) are over 60. The young drivers<br />

represent 15.3 % <strong>of</strong> the portfolio (2295 policyholders) and the remaining 4686 insured drivers<br />

(32.5 % <strong>of</strong> the portfolio) are between 25 and 30.<br />

In the preliminary descriptive analysis, the actuary considers the marginal impact <strong>of</strong> each<br />

rating factor. The possible effect <strong>of</strong> the other explanatory variables is thus disregarded.<br />

Let us assume for a while that the claim frequencies only depend on Age. If the<br />

occurrence <strong>of</strong> the claims filed by the policyholders conforms with a Poisson process, the<br />

number N i <strong>of</strong> claims reported by policholder i obeys the Poisson distribution with mean<br />

d i Agei , where d i is the exposure-to-risk (i.e., the length <strong>of</strong> the coverage period) for<br />

policyholder i, Agei is the age category to which policyholder i belongs (1, 2, 3 or<br />

4) and the j s, j = 1 2 3 4, are the annual expected claim frequencies for the 4 age<br />

classes.


<strong>Risk</strong> <strong>Classification</strong> 55<br />

Number <strong>of</strong> policyholders<br />

7000<br />

6722<br />

.220 .213<br />

6000<br />

.210<br />

.200<br />

.190<br />

.180<br />

.170<br />

5000<br />

.160<br />

.155<br />

4686<br />

.150<br />

.140<br />

.130<br />

4000<br />

.123<br />

.120<br />

.110<br />

.108<br />

3000<br />

.100<br />

.090<br />

2295<br />

.080<br />

.070<br />

2000<br />

.060<br />

1000<br />

0<br />

18–24 25–30 31–60<br />

802<br />

>60<br />

.050<br />

.040<br />

.030<br />

.020<br />

.010<br />

.000<br />

18–24 25–30 31–60 >60<br />

Age<br />

Age<br />

Figure 2.2 Composition <strong>of</strong> Portfolio A with respect to Age (left panel) and observed annual claim<br />

frequencies according to Age (right panel).<br />

Annual claim frequency<br />

Assuming that the numbers <strong>of</strong> claims filed by the policyholders in the portfolio are<br />

independent random variables, the likelihood then becomes<br />

1 2 3 4 =<br />

∝<br />

n∏<br />

i=1<br />

) ki<br />

( )<br />

(d i Agei<br />

exp − d i Agei<br />

k i !<br />

(<br />

4∏<br />

exp − j<br />

j=1<br />

∑<br />

iAgei=j<br />

)<br />

∑<br />

iAgei=j k<br />

d i <br />

i<br />

j<br />

where k i denotes the observed number <strong>of</strong> claims for policyholder i, and ‘∝’ reads ‘is<br />

proportional to’. Differentiating L 1 2 3 4 = ln 1 2 3 4 with respect to j<br />

and setting the derivative equal to 0 gives<br />

−<br />

∑<br />

iAgei=j<br />

d i + 1 j<br />

∑<br />

iAgei=j<br />

k i = 0


56 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The maximum likelihood estimator <strong>of</strong> j is then obtained from<br />

̂j =<br />

=<br />

∑<br />

iAgei=j k i<br />

∑<br />

iAgei=j d i<br />

# <strong>of</strong> claims filed by policyholders in age category j<br />

total exposure-to-risk (in year) for age category j <br />

Of course, the reliability <strong>of</strong> ̂ j depends on the magnitude <strong>of</strong> the exposure-to-risk appearing<br />

in the denominator.<br />

We see in Figure 2.2 that the observed annual claim frequency decreases with age. The<br />

young drivers are riskier as their observed annual claim frequency is 21.3 %. Old drivers are<br />

safer with an observed annual claim frequency <strong>of</strong> 10.8 %. We notice that the policyholders<br />

aged between 25 and 30, with an observed annual claim frequency <strong>of</strong> 15.5 %, tend to<br />

report more claims than the policyholders aged between 31 and 60 (observed annual claim<br />

frequency <strong>of</strong> 12.3 %).<br />

The analysis conducted in this paragraph is <strong>of</strong>ten referred to as a one-way analysis: the<br />

effect <strong>of</strong> Age on claim frequencies is studied without taking account <strong>of</strong> the effect <strong>of</strong> other<br />

variables. The major flaw with one-way analyses is that they can be distorted by correlations.<br />

For instance, one can imagine that the majority <strong>of</strong> young policyholders split the payment<br />

<strong>of</strong> the insurance premiums (for budget reasons). If more claims are filed by young drivers,<br />

a one-way analysis <strong>of</strong> Split may show higher claim frequencies for drivers having split<br />

their premium payment. However, this may result from the fact that such drivers are in<br />

general the high-risk young policyholders. Premium differentials based on one-way analyses<br />

<strong>of</strong> Split and Age would double-count the effect <strong>of</strong> Age. Multivariate methods (such as the<br />

Poisson regression approach discussed below) adjust for correlations between explanatory<br />

variables. The correlations existing between explanatory variables explain why the policies<br />

are not uniformly distributed over risk classes but cluster in some specified highly populated<br />

classes.<br />

Gender<br />

It is common to include the gender <strong>of</strong> the main driver in the actuarial ratemaking. Note<br />

however that some states have banned the use <strong>of</strong> this rating factor (as well as <strong>of</strong> age, for<br />

instance). The reason is that age and gender are out <strong>of</strong> the policyholders’ control, in contrast<br />

to many other covariates (like the power <strong>of</strong> the car, or the driving area). The latter may thus<br />

be freely used for ratemaking purposes, but some limitations are needed for the former.<br />

In Portfolio A, there are 9358 male policyholders (representing 64.7 % <strong>of</strong> the portfolio)<br />

and 5147 female policyholders (representing 35.3 % <strong>of</strong> the portfolio). Figure 2.3 suggests a<br />

higher annual claim frequency for males (observed annual claim frequency <strong>of</strong> 15.2 %) than<br />

for females (observed annual claim frequency <strong>of</strong> 13.6 %).<br />

District<br />

Figure 2.4 gives the distribution <strong>of</strong> the policyholders according to the district where they<br />

live. We see that 8664 policyholders (representing 59.8 % <strong>of</strong> the portfolio) live in an urban<br />

area and 5841 policyholders (representing 40.2 % <strong>of</strong> the portfolio) live in a rural one. The<br />

urban policyholders have a larger observed annual claim frequency (15.7 %) than the rural<br />

ones (13.0 %).


<strong>Risk</strong> <strong>Classification</strong> 57<br />

Number <strong>of</strong> policyholders<br />

10000<br />

9000<br />

8000<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

9358<br />

5147<br />

Annual claim frequency<br />

.160<br />

.152<br />

.150<br />

.140<br />

.136<br />

.130<br />

.120<br />

.110<br />

.100<br />

.090<br />

.080<br />

.070<br />

.060<br />

.050<br />

.040<br />

.030<br />

.020<br />

.010<br />

Male Female<br />

.000<br />

Male Female<br />

Gender<br />

Gender<br />

Figure 2.3 Composition <strong>of</strong> Portfolio A with respect to Gender (left panel) and observed annual claim<br />

frequencies according to Gender (right panel).<br />

Use<br />

The majority <strong>of</strong> the policyholders (12 745 insured drivers, representing 88.0 % <strong>of</strong> the<br />

portfolio) use their vehicle only for leisure and commuting. There are 1760 pr<strong>of</strong>essional<br />

users (representing 12.0 % <strong>of</strong> the portfolio). Figure 2.5 indicates that the influence <strong>of</strong> the use<br />

<strong>of</strong> the vehicle on the number <strong>of</strong> claims is almost negligible, as the annual observed claim<br />

frequencies are almost equal for the two categories <strong>of</strong> insured drivers: 14.6 % for private<br />

users and 14.3 % for pr<strong>of</strong>essional users.<br />

Premium Split<br />

Figure 2.6 indicates that 10 568 policyholders (representing 74.5 % <strong>of</strong> the portfolio) pay<br />

their premium once a year. The remaining 3937 policyholders (representing 25.5 % <strong>of</strong> the<br />

portfolio) pay their premium twice a year, thrice a year or on a monthly basis. Figure 2.6<br />

also shows the influence <strong>of</strong> the premium split on the number <strong>of</strong> claims: it can be seen that<br />

splitting the premium payment is associated with a considerable increase in the observed<br />

annual claim frequency (from 12.4 % to 21.1 %).


58 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

9000<br />

8000<br />

7000<br />

8664<br />

.160<br />

.150<br />

.140<br />

.130<br />

.120<br />

.130<br />

.157<br />

Number <strong>of</strong> policyholders<br />

6000<br />

5000<br />

4000<br />

3000<br />

5841<br />

Annual claim frequency<br />

.110<br />

.100<br />

.090<br />

.080<br />

.070<br />

.060<br />

.050<br />

2000<br />

.040<br />

.030<br />

1000<br />

.020<br />

.010<br />

0<br />

Rural Urban<br />

District<br />

.000<br />

Rural Urban<br />

District<br />

Figure 2.4 Composition <strong>of</strong> Portfolio A with respect to District (left panel) and observed annual claim<br />

frequencies according to District (right panel).<br />

2.2.5 Interactions<br />

So far, only the marginal effect <strong>of</strong> each observed covariate on the claim frequency has<br />

been assessed. Besides these one-way analyses, it is also important to account for possible<br />

interactions. Often Gender and Age interact, in the sense that the effect <strong>of</strong> Age on the average<br />

claim frequency is different for males than for females. Typically, young male drivers are<br />

more dangerous than young female drivers (but the higher risk associated with young male<br />

drivers may be due to higher annual mileage, or to other risk factors correlated to the fact<br />

<strong>of</strong> being a young male). Formally, two explanatory variables are said to interact when the<br />

effect <strong>of</strong> one factor varies depending on the levels <strong>of</strong> the other factor. Multivariate models<br />

allow for investigation into interaction effects.<br />

This phenomenon can be seen from Figure 2.7. The observed annual claim frequency for<br />

young males (ages 18–24) peaks at 23.8 %, whereas young females (ages 18–24) have an<br />

observed annual claim frequency similar to males aged 25–30. Both genders become more<br />

similar for categories 31–60 and over 60. We have thus detected an Age–Gender interaction<br />

in Portfolio A.<br />

Note that standard regression models do not automatically account for interaction (in<br />

contrast to correlations between covariates, for which the estimated regression coefficients


<strong>Risk</strong> <strong>Classification</strong> 59<br />

13000<br />

12000<br />

12745<br />

.150<br />

.140<br />

.147<br />

.143<br />

11000<br />

.130<br />

Number <strong>of</strong> policyholders<br />

10000<br />

9000<br />

8000<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1760<br />

Annual claim frequency<br />

.120<br />

.110<br />

.100<br />

.090<br />

.080<br />

.070<br />

.060<br />

.050<br />

.040<br />

.030<br />

.020<br />

1000<br />

.010<br />

0<br />

Private Pr<strong>of</strong>essional<br />

Use <strong>of</strong> car<br />

.000<br />

Private Pr<strong>of</strong>essional<br />

Use <strong>of</strong> car<br />

Figure 2.5 Composition <strong>of</strong> Portfolio A with respect to Use (left panel) and observed annual claim<br />

frequencies according to Use (right panel).<br />

are adjusted). The reason is as follows: interactions cannot be rendered by linear combinations<br />

<strong>of</strong> the covariates. To account for interactions, nonlinear functions <strong>of</strong> the covariates (products)<br />

are needed, as explained in Example 2.3 below. In ANOVA terminology, one would speak<br />

<strong>of</strong> interaction when the effects are not just additive. The actuary needs to identify the existing<br />

interactions at the preliminary exploratory stage, and then define new ratemaking factors<br />

combining the levels <strong>of</strong> the two interacting variables. Inserting these new factors in the<br />

regression model then allows us to account for interaction.<br />

2.2.6 True Versus Apparent Dependence<br />

The descriptive analysis conducted so far suggests that some observed characteristics may<br />

influence the number <strong>of</strong> claims reported to the company. It is nevertheless important to<br />

realize the kind <strong>of</strong> relationship just evidenced: the actuary has to keep in mind that he<br />

has not established any causal relationship so far, but only that some correlations seem<br />

to exist between the rating factors and the number <strong>of</strong> claims. Such correlations may have<br />

been produced by a causal relationship, but could also result from confounding effects. The<br />

following simple example illustrates this situation.


60 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Number <strong>of</strong> policyholders<br />

0<br />

Not split Split<br />

11000<br />

.220<br />

10568<br />

.210<br />

10000<br />

.200<br />

.190<br />

9000<br />

.180<br />

.170<br />

8000<br />

.160<br />

.000<br />

.150<br />

7000<br />

.140<br />

.130<br />

6000<br />

.120<br />

.110<br />

5000<br />

.100<br />

4000<br />

3000<br />

2000<br />

1000<br />

3957<br />

.090<br />

.080<br />

.070<br />

.060<br />

.050<br />

.040<br />

.030<br />

.020<br />

.010<br />

Premium split<br />

Annual claim frequency<br />

.211<br />

.124<br />

Not split Split<br />

Premium split<br />

Figure 2.6 Composition <strong>of</strong> Portfolio A with respect to Split (left panel) and observed annual claim<br />

frequencies according to Split (right panel).<br />

Example 2.1 Assume that living in rural or urban areas (observed covariate District) does<br />

not influence claim occurrences but the driving experience (hidden characteristic) does.<br />

Specifically, let us assume that<br />

PrN ≥ 1inexperienced, rural = PrN ≥ 1inexperienced, urban<br />

= PrN ≥ 1inexperienced = 015<br />

PrN ≥ 1experienced, rural = PrN ≥ 1experienced, urban<br />

= PrN ≥ 1experienced = 005<br />

The portfolio comprises 50 % inexperienced drivers, but<br />

and<br />

Prinexperiencedurban = 1 − Prexperiencedurban = 09<br />

Prinexperiencedrural = 1 − Prexperiencedrural = 01


<strong>Risk</strong> <strong>Classification</strong> 61<br />

Number <strong>of</strong> policyholders<br />

5000 .240<br />

.230<br />

.220<br />

4286<br />

.210<br />

.200<br />

4000<br />

.190<br />

.180<br />

.170<br />

.160<br />

2972<br />

.150<br />

3000<br />

.140<br />

.130<br />

2436<br />

.120<br />

.110<br />

2000<br />

.100<br />

1714<br />

.090<br />

1494<br />

.080<br />

.070<br />

.060<br />

1000 .050<br />

801<br />

606<br />

.040<br />

.030<br />

196<br />

.020<br />

.010<br />

0<br />

.000<br />

F<br />

18–24<br />

F<br />

25–30<br />

F<br />

31–60<br />

F<br />

>60<br />

M<br />

18–24<br />

M<br />

25–30<br />

Interaction Age–Gender<br />

M<br />

31–60<br />

M<br />

>60<br />

Annual claim frequency<br />

.165<br />

F<br />

18–24<br />

.140<br />

F<br />

25–30<br />

.127<br />

F<br />

31–60<br />

.112<br />

F<br />

>60<br />

.238<br />

M<br />

18–24<br />

.164<br />

M<br />

25–30<br />

Interaction Age–Gender<br />

.120<br />

M<br />

31–60<br />

.107<br />

M<br />

>60<br />

Figure 2.7 Composition <strong>of</strong> Portfolio A with respect to the Age–Gender interaction (left panel) and<br />

observed annual claim frequencies according to the Age–Gender interaction (right panel).<br />

In other words, there is a majority <strong>of</strong> experienced drivers in rural areas (90 %) whereas in<br />

urban areas, the majority <strong>of</strong> drivers are inexperienced.<br />

Clearly,<br />

whereas<br />

PrN ≥ 1urban = PrN ≥ 1inexperienced, urban Prinexperiencedurban<br />

+ PrN ≥ 1experienced, urban Prexperiencedurban<br />

= 014<br />

PrN ≥ 1rural = PrN ≥ 1inexperienced, rural Prinexperiencedrural<br />

+ PrN ≥ 1experienced, rural Prexperiencedrural<br />

= 006<br />

Even if District does not influence the number <strong>of</strong> claims once the driving experience has been<br />

accounted for, unconditionally there is some dependence between District and the number<br />

<strong>of</strong> claims filed by policyholders. The univariate analysis will detect a rural/urban effect, but<br />

the latter should disappear in the multivariate analysis (taking experience and the variable<br />

District into account). Therefore, we cannot say after a marginal analysis that living in an<br />

urban area causes an increase in the average number <strong>of</strong> claims: both variables are related to<br />

driving experience.


62 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The reader has always to keep in mind that it is <strong>of</strong>ten not possible to disentangle a true<br />

effect <strong>of</strong> a rating factor from an apparent effect resulting from correlation with unobservable<br />

characteristics.<br />

2.3 Poisson Regression Model<br />

2.3.1 Coding Explanatory Variables<br />

All the explanatory variables presented above are categoric (or nominal). A categoric variable<br />

with k levels partitions the portfolio into k classes (for instance, 4 classes for Age in Portfolio<br />

A). It is coded with the help <strong>of</strong> k − 1 binary variables, being all zero for the reference level.<br />

The reference level is usually selected as the most populated class in the portfolio. The<br />

following example illustrates this coding methodology.<br />

Example 2.2 In Portfolio A, reference levels are ‘31–60’ for Age, ‘Male’ for Gender,<br />

‘Urban’ for District, ‘Premium paid once a year’ for Split and ‘Private’ for Use. Policyholder<br />

i is then represented by a vector <strong>of</strong> dummies giving the values <strong>of</strong><br />

{ 1 if policyholder i is less than 24<br />

x i1 =<br />

0 otherwise<br />

{ 1 if policyholder i is 25–30<br />

x i2 =<br />

0 otherwise<br />

{ 1 if policyholder i is over 60<br />

x i3 =<br />

0 otherwise<br />

{ 1 if policyholder i is a female<br />

x i4 =<br />

0 otherwise<br />

{ 1 if policyholder i lives in a rural area<br />

x i5 =<br />

0 otherwise<br />

{ 1 if policyholder i splits his premium payment<br />

x i6 =<br />

0 otherwise<br />

{ 1 if policyholder i uses his car for pr<strong>of</strong>essional reasons<br />

x i7 =<br />

0 otherwise.<br />

The results are interpreted with respect to the reference class (for which all the x ij s<br />

are equal to 0) corresponding to a man aged between 31 and 60, living in an urban area,<br />

paying the premium once a year and using the car for private purposes only. The sequence<br />

(1,0,0,0,0,0,1) represents a man aged less than 24, living in an urban area, paying the premium<br />

once a year and using the car for pr<strong>of</strong>essional reasons.<br />

In case two covariates interact, a new variable is created by combining the levels <strong>of</strong> each<br />

<strong>of</strong> the interacting covariates, as shown in the following example.


<strong>Risk</strong> <strong>Classification</strong> 63<br />

Example 2.3 Figure 2.7 suggests that the variables Age and Gender interact in Portfolio<br />

A. It is thus not possible to represent accurately the effect <strong>of</strong> being a male policyholder<br />

(compared with being a female policyholder) in terms <strong>of</strong> a single multiplier, nor can the<br />

effect <strong>of</strong> Age be represented by a single multiplier. The relevant explanatory variables are not<br />

x i1 , x i2 , x i3 , and x i4 but rather x i1 x i4 , x i2 x i4 , x i3 x i4 , 1 − x i1 x i2 x i3 x i4 , x i1 1 − x i4 , x i2 1 − x i4 ,<br />

and x i3 1 − x i4 (denoted henceforth as x<br />

ij ′ ).<br />

To reflect the situation accurately, it is thus necessary to consider multipliers dependent<br />

on the combined levels <strong>of</strong> Age and Gender. To this end, a variable Gender ∗ Age is created,<br />

with levels ‘Female 18–24’, ‘Female 25–30’, ‘Female 31–60’, ‘Female over 60’, ‘Male<br />

18–24’, ‘Male 25–30’, ‘Male 31–60’ and ‘Male over 60’. This new variable possesses 8<br />

levels, and is coded by means <strong>of</strong> 7 dummies, being all 0 for the reference level (taken as<br />

‘Male 31–60’). Specifically, the explanatory variables x i1 to x i4 in Example 2.2 are replaced<br />

with<br />

{ 1 if policyholder i is a female less than 24<br />

x ′ i1 = 0 otherwise<br />

{ 1 if policyholder i is a female aged 25–30<br />

x ′ i2 = 0 otherwise<br />

{ 1 if policyholder i is a female aged 31–60<br />

x ′ i3 = 0 otherwise<br />

{ 1 if policyholder i is a female over 60<br />

x ′ i4 = 0 otherwise<br />

{ 1 if policyholder i is a male less than 24<br />

x ′ i5 = 0 otherwise<br />

{ 1 if policyholder i is a male aged 25–30<br />

x ′ i6 = 0 otherwise<br />

{ 1 if policyholder i is a male over 60<br />

x ′ i7 = 0 otherwise<br />

Rather than declaring Age and Gender as two explanatory variables coded with the help <strong>of</strong><br />

x i1 to x i4 , a combined Age ∗ Gender variable is declared and is coded with the help <strong>of</strong> the<br />

covariates x<br />

i1 ′ to x′ i7 .<br />

The part <strong>of</strong> the linear predictor involving Age and Gender is 1 x i1 +···+ 4 x i4 without<br />

interaction, and becomes ′ 1 x′ i1 +···+′ 7 x′ i7<br />

if allowance is made for interaction. If Age and<br />

Gender indeed interact then the values <strong>of</strong> these two linear combinations differ from each<br />

other, whereas they collapse in the absence <strong>of</strong> interaction.<br />

Allowing for interaction thus dramatically increases the number <strong>of</strong> explanatory variables.<br />

Introducing Age and Gender separately requires 4 binary variables whereas allowing<br />

for the Age–Gender interaction requires 7 dummies. As a consequence, the number<br />

<strong>of</strong> parameters to be estimated increases accordingly. We will see below that grouping<br />

some levels <strong>of</strong> the combined variable accounting for interaction is nevertheless <strong>of</strong>ten<br />

possible.


64 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

2.3.2 Loglinear Poisson Regression Model<br />

In Poisson regression, we have a collection <strong>of</strong> independent Poisson counts whose means are<br />

modelled as nonnegative functions <strong>of</strong> covariates. Specifically, let N i , i = 1 2n,bethe<br />

number <strong>of</strong> claims reported by policyholder i and d i be the corresponding risk exposure (the<br />

N i s are assumed to be independent). All the observable characteristics (the a priori variables<br />

presented in Section 2.2 for Portfolio A, say) related to this policyholder are summarized<br />

into the vector xi<br />

T = x i1 x ip . Poisson regression is a technique analogous to linear<br />

regression except that errors no longer follow a Normal distribution but the randomness<br />

in the model is described by a Poisson distribution. We first assume that the conditional<br />

expectation <strong>of</strong> N i given x i is <strong>of</strong> the form<br />

(<br />

)<br />

p∑<br />

EN i x i = d i exp 0 + j x ij i= 1 2n (2.1)<br />

j=1<br />

where T = 0 1 p is the vector <strong>of</strong> unknown regression coefficients. The<br />

explanatory variables enter the model in the linear combination 0 + ∑ p<br />

j=1 jx ij , where 0<br />

acts as an intercept and j is the coefficient indicating the weight given to the jth covariate.<br />

The Poisson regression model consists in stating that N i is Poisson distributed with mean<br />

given by Expression (2.1), that is<br />

( (<br />

))<br />

p∑<br />

N i ∼ oi d i exp 0 + j x ij i= 1 2n<br />

2.3.3 Score<br />

The quantity<br />

j=1<br />

score i = 0 +<br />

p∑<br />

j x ij<br />

is called the score (or linear predictor in statistics) because it allows the actuary to rank the<br />

policyholders from the least to the most dangerous. The claim frequency for policyholder i is<br />

d i expscore i ; its annual claim frequency being expscore i . Increasing the score thus means<br />

that the associated average annual claim frequency increases. The use <strong>of</strong> the exponential link<br />

function ensures the claim frequency is positive even if the score is negative.<br />

Let us denote as ̂ 0 ̂ 1 ̂ p the estimators <strong>of</strong> the regression coefficients<br />

0 1 p . In a statistical sense,<br />

̂i = d i exp ( (<br />

)<br />

) p∑<br />

ŝcore i = di exp ̂0 + ̂j x ij<br />

is the predicted expected number <strong>of</strong> claims for policyholder i. Prediction in this sense does<br />

not refer to ‘predicting the future’ (called forecasting by statisticians) but rather to guessing<br />

the expected number <strong>of</strong> claims (i.e., the response) from the values <strong>of</strong> the regressors in an<br />

observation taken under the same circumstances as the sample from which the regression<br />

equation was estimated.<br />

j=1<br />

j=1


<strong>Risk</strong> <strong>Classification</strong> 65<br />

2.3.4 Multiplicative Tariff<br />

When the a priori variables x ij s are coded by means <strong>of</strong> binary variables (so that each<br />

policyholder is represented by a vector <strong>of</strong> 0s and 1s), the intercept 0 represents the risk<br />

associated to the reference class. Then, the annual claim frequency i associated with<br />

characteristics x i is <strong>of</strong> the following multiplicative form<br />

i = exp 0 ∏<br />

exp j <br />

jx ij =1<br />

where exp 0 is the annual claim frequency corresponding to the reference class and<br />

exp j models the impact <strong>of</strong> the jth ratemaking variable. The intercept 0 represents the risk<br />

associated with the reference class. If j > 0, being in the class coded by the jth explanatory<br />

variable increases the score and thus the annual claim frequency. Conversely, if j < 0, the<br />

score and the annual claim frequency decrease when x ij = 1. With the exponential link, high<br />

(respectively low) scores mean high (respectively low) claim frequencies. The following<br />

example makes this clear.<br />

Example 2.4<br />

<strong>of</strong> the form<br />

Here,<br />

Let us continue Example 2.2. In Portfolio A, the score for policyholder i is<br />

score i = 0 + 1 x i1 + 2 x i2 +···+ 7 x i7 <br />

etc<br />

exp 0 = annual claim frequency for men aged 31–60, living in an<br />

urban area, paying the premium once a year, using the<br />

car for private purposes<br />

exp 0 + 1 = annual claim frequency for men less than 24, living in an<br />

urban area, paying the premium once a year, using the<br />

car for private purposes<br />

exp 0 + 2 = annual claim frequency for men aged 25–30, living in an<br />

urban area, paying the premium once a year, using the<br />

car for private purposes<br />

exp 0 + 3 = annual claim frequency for men over 60, living in an<br />

urban area, paying the premium once a year, using the<br />

car for private purposes<br />

exp 0 + 4 = annual claim frequency for women aged 31–60, living in an<br />

urban area, paying the premium once a year, using the<br />

car for private purposes


66 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

2.3.5 Likelihood Equations<br />

Let k i be the number <strong>of</strong> claims filed by policyholder i during the observation period. The<br />

likelihood associated with these observations equals<br />

where<br />

n∏<br />

n∏<br />

= PrN i = k i x i = exp− i k i<br />

i<br />

k i ! <br />

i=1<br />

i=1<br />

i = d i expscore i = expln d i + score i <br />

The maximum likelihood estimator ̂ <strong>of</strong> maximizes : ̂ is the value <strong>of</strong> the regression<br />

coefficients that makes the observations the most plausible.<br />

Remark 2.1 (Grouping Data) The maximum likelihood estimators obtained for individual<br />

or grouped data are identical. Let us prove it formally. To this end, let group be the<br />

likelihood obtained after a grouping in risk classes. We will show below that<br />

∝ group <br />

In words, the likelihood group based on grouped data is proportional to the likelihood<br />

based on individual data, so that the corresponding maximum likelihood estimates will<br />

coincide.<br />

Let s 1 s q be the q possible values for the score, say, and let us define<br />

d •j =<br />

∑<br />

iscore i =s j<br />

d i and k •j =<br />

∑<br />

iscore i =s j<br />

k i for j = 1q<br />

In words, d •j is the total risk exposure for risk class j (corresponding to the value s j <strong>of</strong> the<br />

score) and k •j is the total number <strong>of</strong> claims recorded for the same risk class. Then<br />

q∏ ∏<br />

= exp− i k i<br />

i<br />

j=1<br />

k<br />

iscore i =s j i !<br />

⎛ ⎞<br />

q∏<br />

∝ exp ⎝−<br />

∑<br />

i<br />

⎠ ( ) k•j<br />

exps j d •j<br />

j=1<br />

iscore i =s j<br />

⎛<br />

q∏<br />

∑<br />

= exp ⎝− exps j ⎠ ( ) k•j<br />

exps j d •j<br />

j=1<br />

iscore i =s j<br />

d i<br />

⎞<br />

∝<br />

q∏<br />

exp ( ) ( ) k•j<br />

exps j d •j<br />

− exps j d •j<br />

k •j !<br />

j=1<br />

= group <br />

Maximizing or group then gives the same maximum likelihood estimator ̂.


<strong>Risk</strong> <strong>Classification</strong> 67<br />

The computation is easier if we change the likelihood to the log-likelihood which is then<br />

given by<br />

L = ln =<br />

n∑<br />

i=1<br />

(<br />

− ln k i !+k i ln i − i<br />

)<br />

(2.2)<br />

The maximum likelihood estimators ̂ 0 and the ̂ j s are the solutions <strong>of</strong> the following<br />

likelihood equations that are obtained by making the first derivatives <strong>of</strong> the log-likelihood<br />

with respect to the regression coefficients equal to zero:<br />

<br />

0<br />

L = 0 ⇔<br />

n∑<br />

k i − i = 0 (2.3)<br />

i=1<br />

<br />

j<br />

L = 0 ⇔<br />

n∑<br />

x ij k i − i = 0 j = 1p (2.4)<br />

i=1<br />

2.3.6 Interpretation <strong>of</strong> the Likelihood Equations<br />

Equation (2.3) has an obvious interpretation: the fitted total number <strong>of</strong> claims ∑ n<br />

i=1̂ i is<br />

equal to the observed total number <strong>of</strong> claims ∑ n<br />

i=1 k i. Therefore, provided that an intercept<br />

0 is included in the score, the total claim number predicted by the regression model equals<br />

its observed counterpart. Note that this equality holds for the observation period and not<br />

necessarily for the future, when the ratemaking will be implemented in practice. In other<br />

words, we cannot be sure that ∑ n<br />

i=1̂ i claims will be filed in the future, just that the actual<br />

number <strong>of</strong> claims should be close to ∑ n<br />

i=1̂ i if the yearly number <strong>of</strong> claims filed to the<br />

company remains stable over time.<br />

The interpretation <strong>of</strong> the second likelihood Equation (2.4) is as follows: In Example 2.2<br />

with Portfolio A, Equation (2.4) for j = 4 gives<br />

∑<br />

females<br />

k i = ∑<br />

females<br />

Therefore, the model fits exactly the total number <strong>of</strong> claims filed by female policyholders.<br />

There is no cross-subsidies between men and women. The conclusion is similar for the other<br />

values <strong>of</strong> j. For j = 1 2 3 for instance, the Equations (2.4) thus ensure that the sum <strong>of</strong> all<br />

the claims reported for each age category is exactly reproduced by the model.<br />

̂i <br />

2.3.7 Solving the Likelihood Equations with the Newton–Raphson<br />

Algorithm<br />

The likelihood equations do not admit explicit solutions and must therefore be solved<br />

numerically. Let U be the gradient vector <strong>of</strong> the log-likelihood L = ln defined


68 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

in (1.47). Let us define ˜x i as the vector x i <strong>of</strong> explanatory variables for policyholder i<br />

supplemented with a unit first component, that is, ˜x i = 1 x T i T . Then, considering (2.3)–<br />

(2.4), U is given by<br />

U =<br />

n∑<br />

˜x i k i − i (2.5)<br />

i=1<br />

in the Poisson regression model. Let H be the Hessian matrix <strong>of</strong> L defined in (1.50).<br />

Specifically, H is given by<br />

H =−<br />

n∑<br />

˜x i˜x T i i (2.6)<br />

i=1<br />

in the Poisson regression model. The maximum likelihood estimator ̂ j <strong>of</strong> the parameters<br />

j then solves U = 0.<br />

The approach used to solve the likelihood equations is the Newton–Raphson algorithm<br />

(1.51). Starting from an appropriate ̂ 0 , the Newton–Raphson algorithm is based on the<br />

following iteration<br />

̂r+1 = ̂<br />

)<br />

r −1<br />

− H<br />

(̂r U<br />

(̂r)<br />

(2.7)<br />

= ̂ r +<br />

( ) n∑ −1<br />

˜x i˜x T ̂<br />

r n∑ (<br />

i i ˜x i k i − ̂ ) r<br />

i<br />

i=1<br />

i=1<br />

for r = 0 1 2, where ̂ i<br />

r<br />

= di exp˜x T i ̂ r . Appropriate starting values are given by<br />

̂ 0<br />

0<br />

= ln<br />

1<br />

n<br />

n∑<br />

k i and ̂<br />

0<br />

j = 0 for j = 1p.<br />

i=1<br />

Note that these starting values are equal to the values <strong>of</strong> the regression coefficients when no<br />

segmentation is in force. Therefore, final values close to the starting ones indicate that the<br />

portfolio is quite homogeneous.<br />

Remark 2.2 (Iterative Least-Squares) It is possible to interpret the Newton–Raphson<br />

approach (2.7) in terms <strong>of</strong> iterative least-squares. Specifically, it is possible to rewrite<br />

the iterative algorithm (2.7) in the Poisson model in such a way that ̂r+1<br />

appears as the maximum likelihood estimator in a linear model with adjusted dependent<br />

variables. Fitting the Poisson regression model by maximum likelihood thus boils down<br />

to estimating the regression parameter in a sequence <strong>of</strong> linear models, with adjusted<br />

responses and explanatory variables. This is particularly interesting since the numerical<br />

aspects <strong>of</strong> estimation in a linear model are well-known and have been optimized for<br />

decades.


<strong>Risk</strong> <strong>Classification</strong> 69<br />

2.3.8 Wald Confidence Intervals<br />

The asymptotic variance-covariance matrix ̂<br />

<strong>of</strong> the maximum likelihood estimator ̂ <strong>of</strong><br />

the regression coefficients vector is the inverse <strong>of</strong> the Fisher information matrix. This<br />

matrix can be estimated by<br />

̂̂<br />

=<br />

( ) n∑ −1<br />

˜x i˜x T ̂ i i where ̂ i = d i expŝcore i <br />

i=1<br />

We know from (1.49) that provided the sample size is large enough ̂− is approximately<br />

or0 ̂̂ distributed. It is thus possible to compute confidence intervals at level 1− for<br />

each <strong>of</strong> the j s. These intervals are <strong>of</strong> the form<br />

[̂j − z /2̂̂j<br />

̂<br />

]<br />

j + z /2̂̂j<br />

(2.8)<br />

where ̂ 2̂j<br />

is the estimated variance <strong>of</strong> ̂ j , given by the element j j <strong>of</strong> ̂̂.<br />

Remark 2.3 (Confidence Intervals for the j s: the Likelihood Ratio Method) The<br />

confidence interval (2.8) is based on the large sample properties <strong>of</strong> the maximum likelihood<br />

estimator ̂. Other methods for constructing such a confidence interval are available. One<br />

such method is based on the pr<strong>of</strong>ile likelihood for j that is defined as<br />

j j = max <br />

0 j−1 j+1 p<br />

If ̂ is the maximum likelihood estimator <strong>of</strong> , we have that 2 ( L̂ − L j j ) is<br />

approximately 1 2 provided j is the true parameter value. A confidence interval at level<br />

1 − for j is then given by<br />

{ ∣ ∣∣Lj<br />

j j ≥ L̂ − 1 }<br />

2 2 11−<br />

<br />

where 2 11− is the 1 − th quantile <strong>of</strong> the 2 1 distribution.<br />

2.3.9 Testing for Hypothesis on a Single Parameter<br />

It is <strong>of</strong>ten interesting to check the validity <strong>of</strong> the null hypothesis H 0 : j = 0 against the<br />

alternative H 1 : j ≠ 0. If the jth explanatory variable is dichotomous (think for instance<br />

<strong>of</strong> gender), then failing to reject H 0 suggests that this variable is not relevant to explaining<br />

the expected number <strong>of</strong> claims. If the jth explanatory variable is coded by means <strong>of</strong> a set<br />

<strong>of</strong> binary variables, then the nullity <strong>of</strong> the regression coefficient associated with one <strong>of</strong> the<br />

binary variables means that the corresponding level can be grouped with the reference level.<br />

In such a case, equality between the regression coefficients should also be tested to decide<br />

about the optimal grouping <strong>of</strong> the levels. Hypotheses involving a set <strong>of</strong> regression parameters<br />

will be examined in Section 2.3.13.


70 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Considering (1.52), the easiest test statistic for H 0 against H 1 is certainly<br />

T =<br />

̂j<br />

̂̂j<br />

which is approximately or0 1 under H 0 , provided the sample size is large enough.<br />

Alternatively, T 2 is approximately Chi-square with one degree <strong>of</strong> freedom. Rejection <strong>of</strong> H 0<br />

occurs when T is large in absolute values, or when T 2 is large. In this case, j is significantly<br />

different from 0, and the associated characteristic has a significant impact in ratemaking.<br />

Note that, as always with hypothesis testing, statistical significance is the resultant <strong>of</strong> two<br />

effects: firstly, the distance between the true parameter value and the hypothesized value,<br />

and secondly, the number <strong>of</strong> observations (or more precisely, the amount <strong>of</strong> information<br />

contained in the data). Even a hypothesis that is approximately true (and useful as a working<br />

hypothesis) will be rejected with a sufficiently large sample. Conversely, any hypothesis may<br />

fail to be rejected (and accepted as a working hypothesis) as long as the actuary has only<br />

scanty data at his disposal. The reader should keep this in mind in the numerical illustrations<br />

worked out in this book.<br />

If the explanatory variables are correlated (as it is usually the case in actuarial studies),<br />

it becomes difficult to disentangle the effects <strong>of</strong> one explanatory variable from another, and<br />

the parameter estimates may be highly dependent on which explanatory variables are used in<br />

the model. If the explanatory variables are strongly correlated then the maximum likelihood<br />

estimators will have a large variance. The actuary should then reduce the set <strong>of</strong> regressors.<br />

2.3.10 Confidence Interval for the Expected Annual <strong>Claim</strong> Frequency<br />

It is possible to build a confidence interval for the annual claim frequency. Recall that<br />

the multivariate Normal distribution has the following useful invariance property. Let C<br />

be a given n × n matrix with real entries and let b be a n-dimensional real vector. If<br />

X ∼ or M then Y = CX + b is orC + b CMC T . The variance <strong>of</strong> the predicted<br />

score, ŝcore i = ˜x T i ̂, is thus given by<br />

which is estimated by<br />

Vŝcore i = ˜x T i ̂˜x i<br />

̂ Vŝcore i = ˜x T i ̂̂˜x i <br />

As the maximum likelihood estimator ̂ is approximately Gaussian when the number <strong>of</strong><br />

policies is large, ŝcore i is also Gaussian and an approximate confidence interval at level<br />

1 − for the annual claim frequency can be computed as<br />

[<br />

exp<br />

(̂T˜x i − z /2<br />

√˜x i T ̂̂˜x<br />

)<br />

i exp<br />

(̂T˜x i + z /2<br />

√˜x i T ̂̂˜x<br />

)]<br />

i


<strong>Risk</strong> <strong>Classification</strong> 71<br />

2.3.11 Deviance<br />

Let ̂ be the model likelihood, i.e.<br />

̂ =<br />

n∏<br />

i=1<br />

exp−̂ i ̂ k i<br />

i<br />

k i ! <br />

Note that the maximal value <strong>of</strong> ↦→ exp− k /k! is obtained for = k. Therefore, the<br />

Poisson likelihood is maximum with expected claim frequencies equal to the observed<br />

number <strong>of</strong> claims. The maximal likelihood possible under the Poisson assumption is then<br />

k =<br />

n∏<br />

i=1<br />

exp−k i kk i<br />

i<br />

k i ! <br />

This is the likelihood <strong>of</strong> the saturated model, predicting the observed number <strong>of</strong> claims for<br />

each insured driver (there are thus as many parameters as observations). This model just<br />

replicates the observed data.<br />

The deviance Dk̂ is defined as the likelihood ratio test statistic for the current model<br />

against the saturated model, that is,<br />

Dk̂ =−2ln ̂ (<br />

)<br />

k = 2 ln k − ln ̂<br />

( ) (<br />

∏ n<br />

∏ n<br />

= 2ln<br />

= 2<br />

i=1<br />

(<br />

n∑<br />

i=1<br />

exp−k kk i<br />

i<br />

i k i !<br />

k i ln k i<br />

̂i<br />

− k i −̂ i <br />

− 2ln<br />

)<br />

i=1<br />

)<br />

exp−̂ ̂ k<br />

i<br />

i<br />

i k i !<br />

where y ln y = 0 for y = 0 by convention. It measures the distance <strong>of</strong> the model likelihood<br />

to the saturated model replicating the observed data. The smaller the deviance, the better the<br />

current model.<br />

When an intercept 0 is included in the linear predictor, (2.3) allows us to simplify the<br />

deviance as<br />

Dk̂ = 2<br />

n∑<br />

i=1<br />

k i ln k i<br />

̂i<br />

<br />

Provided the data have been grouped as much as possible and the model is correct, D is<br />

approximately 2 n−dim<br />

distributed (where n is now the number <strong>of</strong> classes in the portfolio).<br />

The model is considered as inappropriate if D obs is ‘too large’, that is, if<br />

D obs > 2 n−dim1−


72 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

2.3.12 Deviance Residuals<br />

The deviance residuals in the Poisson model are given by:<br />

r D i<br />

= √ 2signk i − ̂ i <br />

√<br />

k i ln k i<br />

̂i<br />

− k i − ̂ i <br />

Summing the ri D 2 gives the deviance Dk̂. The deviance residual ri<br />

D is thus the signed<br />

square root <strong>of</strong> the contribution <strong>of</strong> policyholder i to the deviance Dk̂.<br />

A plot for individual data is <strong>of</strong>ten uninformative in motor insurance (because <strong>of</strong> the<br />

few observed values for the k i s, the deviance residuals ri<br />

D are always structured, being<br />

concentrated along curves corresponding to 0 claim, 1 claim, 2 claims, etc.). A plot <strong>of</strong> the<br />

residuals against fitted frequencies ̂ i for grouped data helps to check the adequacy <strong>of</strong> the<br />

model.<br />

2.3.13 Testing a Hypothesis on a Set <strong>of</strong> Parameters<br />

We would like to test the null hypothesis<br />

{<br />

H0 = 0 = 0 1 2 q T<br />

H 1 = 1 = 0 1 2 q q+1 p T <br />

Let D 0 be the deviance <strong>of</strong> the Poisson regression model under H 0 , and D 1 be the deviance<br />

under H 1 . The test statistic is<br />

(<br />

= D 0 − D 1 = 2 Lk − L̂ 0 <br />

= 2<br />

)<br />

(<br />

L̂ 1 − L̂ 0 <br />

(<br />

)<br />

− 2 Lk − L̂ 1 <br />

)<br />

≈ d 2 p−q <br />

Note that is a likelihood ratio test statistic. The null hypothesis H 0 is rejected in favour <strong>of</strong><br />

H 1 if obs is ‘too large’, that is, if<br />

obs > 2 p−q1− <br />

2.3.14 Specification Error and Robust Inference<br />

According to the asymptotic theory related to generalized linear models, the estimator ̂<br />

obtained by maximizing the Poisson likelihood remains consistent and efficient provided<br />

the mean and variance <strong>of</strong> the model are correctly specified (even if the underlying data<br />

generating process is not Poisson). Moreover, in order to obtain consistency, only the correct<br />

specification <strong>of</strong> the mean function is required, that is, only (2.1) has to be valid. And ̂<br />

remains Normally distributed in all cases.<br />

Let us now assume that EN i x i = d i exp T 0 x i holds true for some 0 (i.e. the conditional<br />

mean is correctly specified), but N i is not Poisson distributed (N i is in reality Negative<br />

Binomial, for instance). In this case, there is a specification error, in that inference conducted


<strong>Risk</strong> <strong>Classification</strong> 73<br />

with the Poisson likelihood is based on a false distributional assumption. The Poisson<br />

maximum likelihood estimator ̂ remains nevertheless consistent for the true parameter 0 ,<br />

i.e. ̂ → proba 0 as the sample size n →+. This explains why the Poisson regression<br />

model is so useful: it continues to give reliable estimations for the annual expected claim<br />

frequency even if the true model is not Poisson, provided the sample size is large enough.<br />

However, the variances <strong>of</strong> the ̂ j s are mis-estimated. Inference must then be based on the<br />

robust information matrix estimate <strong>of</strong> the variance-covariance matrix <strong>of</strong> ̂ that is based on the<br />

empirical estimate <strong>of</strong> the observed information. Specifically, because <strong>of</strong> the misspecification,<br />

the asymptotic variance-covariance matrix <strong>of</strong> ̂ is now given by<br />

where<br />

i=1<br />

In practice, it will be estimated by<br />

̂<br />

= −1 <br />

n∑<br />

n∑<br />

= ˜x i˜x T i d i exp T 0˜x i and = ˜x i˜x T i VN ix i <br />

i=1<br />

̂̂<br />

= ̂ −1̂̂ (2.9)<br />

where<br />

n∑<br />

̂ = ˜x i˜x T̂ n∑<br />

i i and ̂ = ˜x i˜x T i ̂ i − k i 2 <br />

i=1<br />

with ̂ i = d i exp̂ T˜x i . Let us point out that ̂ − remains approximately Normally<br />

distributed with mean 0 and covariance matrix ̂,<br />

provided the sample size is large enough.<br />

i=1<br />

2.3.15 Numerical Illustration<br />

Within SAS R , the GENMOD procedure can be used to fit Poisson regression models. This<br />

procedure supports the Normal, Binomial, Poisson, Gamma, Inverse Gaussian, Negative<br />

Binomial and Multinomial distributions, in the framework <strong>of</strong> generalized linear models. A<br />

typical use <strong>of</strong> the GENMOD procedure is to perform Poisson regression with a log link<br />

function. This type <strong>of</strong> model is usually called a loglinear model.<br />

The logarithm <strong>of</strong> the exposure-to-risk is used as an <strong>of</strong>fset, that is, a regression variable<br />

with a constant coefficient <strong>of</strong> 1 for each observation. A log linear relationship between the<br />

mean and the explanatory factors is specified by the log link function. The log link function<br />

ensures that the mean number <strong>of</strong> insurance claims predicted from the fitted model is positive.<br />

The results obtained from the Poisson regression for Portfolio A presented in Section 2.2<br />

are shown in Table 2.1. Table 2.1 is similar to the ‘Analysis <strong>of</strong> Parameter Estimates’ table<br />

produced by the GENMOD procedure. Such a table summarizes the results <strong>of</strong> the iterative<br />

parameter estimation process. For each parameter in the model, the GENMOD procedure<br />

displays columns with the parameter name, the degrees <strong>of</strong> freedom associated with the<br />

parameter, the estimated parameter value, the standard error <strong>of</strong> the parameter estimate, the


74 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 2.1 Results <strong>of</strong> the Poisson regression for the model with the 5 explanatory variables,<br />

Portfolio A.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −22131 00582 −23271 −20991 144740 60 years −00010 02350 −04616 04596 000 09967<br />

Gender ∗ Age Male 18–24 years 06429 00797 04867 07990 6510


<strong>Risk</strong> <strong>Classification</strong> 75<br />

used). The ratio <strong>of</strong> the deviance to the number <strong>of</strong> degrees <strong>of</strong> freedom should be close to<br />

1 to indicate goodness-<strong>of</strong>-fit. Here, we obtain 05357 (the deviance is equal to 7764.83).<br />

Note however that the data should be grouped to make the Chi-square approximation more<br />

reliable, so that we cannot use this statistic at the present stage.<br />

An important aspect <strong>of</strong> insurance ratemaking with generalized regression models is the<br />

selection <strong>of</strong> explanatory variables in the model. Changes in goodness-<strong>of</strong>-fit statistics are<br />

<strong>of</strong>ten used to evaluate the contribution <strong>of</strong> subsets <strong>of</strong> explanatory variables to a particular<br />

model. One strategy for variable selection is to fit a sequence <strong>of</strong> models, beginning with<br />

a simple model with only an intercept term, and then include one additional explanatory<br />

variable in each successive model. The importance <strong>of</strong> the additional explanatory variable<br />

can be measured by the difference in fitted log-likelihoods between successive models.<br />

Asymptotic tests computed by the GENMOD procedure enable the actuary to assess the<br />

statistical significance <strong>of</strong> the additional term (this is called Type I analysis in SAS R ).<br />

Another strategy (adopted here) consists <strong>of</strong> starting from a model incorporating all the<br />

available information, and then excluding the irrelevant explanatory variables. To this end,<br />

the GENMOD procedure generates a Type 3 analysis (analogous to Type III sums <strong>of</strong> squares<br />

in the GLM procedure). A Type 3 analysis does not depend on the order in which the terms<br />

for the model are specified (in contrast to the Type 1 analysis).<br />

Type 3 analysis compares the complete model (that is the model which includes all the<br />

specified variables) with the different submodels obtained by deleting one <strong>of</strong> the explanatory<br />

variables. It enables the actuary to test the relevance <strong>of</strong> one variable taking all the others<br />

into account. It roughly corresponds to the backward approach: at each step, we exclude the<br />

variable with the largest p-value until no more can be excluded (i.e. until all the p-values<br />

are smaller than a fixed threshold, generally 5 %). Note that the Type 3 analysis works with<br />

the variables and not with the levels <strong>of</strong> these variables. Indeed, it is possible to obtain a<br />

relevant variable for which some levels are not relevant. The results <strong>of</strong> the Type 3 analysis<br />

are as follows:<br />

Source DF Chi-square Pr > Chi-sq<br />

Gender ∗ Age 7 7516


76 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The option ‘Estimate’ <strong>of</strong> GENMOD (which is similar to the option ‘Contrast’) can be used<br />

to assess the relevance <strong>of</strong> grouping the levels <strong>of</strong> the interaction Age–Gender 2 by 2. This<br />

option computes likelihood ratio statistics for user-defined contrasts, that is, linear functions<br />

<strong>of</strong> the parameters, and p-values based on their asymptotic Chi-square distributions. Here, the<br />

option ‘Estimate’ is used to test for the equality <strong>of</strong> the regression coefficients (that is, for H 0 :<br />

j1<br />

= j2<br />

against H 1 : j1 ≠ j2<br />

for all the combinations j 1 j 2 corresponding to binary variables<br />

coding Age ∗ Gender). The test statistic is based on the ratio <strong>of</strong> the likelihoods corresponding<br />

to the models under H 0 and under H 1 . This grouping process can be summarized as follows:<br />

Step 1 the reference level ‘Male 31–60’ is first merged with ‘Female >60’ (p-value <strong>of</strong><br />

9967 %);<br />

Step 2 then ‘Male 25–30’ and ‘Female 18–24’ are grouped together (p-value <strong>of</strong> 8585 %);<br />

Step 3 ‘Male 31–60’, ‘Male > 60’ and ‘Female > 60’ are grouped together (p-value <strong>of</strong><br />

6613 %);<br />

Step 4 ‘Male 31–60’, ‘Male > 60’, ‘Female 31–60’ and ‘Female > 60’ are grouped together<br />

(p-value <strong>of</strong> 3517 %);<br />

Step 5 finally ‘Male 25–30’, ‘Female 18–24’ and ‘Female 25–30’ are grouped together<br />

(p-value <strong>of</strong> 1239 %).<br />

The level ‘Male 18–24’ cannot be grouped with other levels. After grouping, the<br />

variable Age ∗ Gender resulting from the interaction <strong>of</strong> Age with Gender has three levels:<br />

‘Males 18–24’, ‘Females 18–30 and Males 25–30’, and ‘Males and Females over 30’. As<br />

expected, it accounts for the extra risk <strong>of</strong> young male drivers (aged between 18 and 24), and<br />

<strong>of</strong> the young drivers, whereas the reference level 3 is assigned to drivers over 30.<br />

The fit <strong>of</strong> the final model is shown in Table 2.2. The log-likelihood is now equal to<br />

−54842 and the Type 3 analysis gives the following results:<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender ∗ Age 2 7166


<strong>Risk</strong> <strong>Classification</strong> 77<br />

Table 2.2 Results <strong>of</strong> the Poisson regression for the final model, Portfolio A.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −21975 00466 −22888 −21062 222551


78 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

5<br />

Individual data<br />

4<br />

Deviance residuals<br />

3<br />

2<br />

1<br />

0<br />

–1<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35<br />

Predicted value<br />

1<br />

Grouped data<br />

0<br />

Deviance residuals<br />

–1<br />

–2<br />

–3<br />

–4<br />

0 100 200 300<br />

Predicted value<br />

Figure 2.8 Deviance residuals against fitted annual claim frequencies for individual data (top panel)<br />

and grouped by risk classes (bottom panel), Portfolio A.<br />

The significance <strong>of</strong> the explanatory variables is clearly apparent (even if the Chi-square<br />

test statistics are smaller than in the regular Poisson case, and the corresponding p-values<br />

larger).


<strong>Risk</strong> <strong>Classification</strong> 79<br />

Table 2.3 Results <strong>of</strong> the robust Poisson regression for the final model, Portfolio A.<br />

Variable Level Coeff Std error Wald 95 % conf limit Z Pr > Z<br />

Intercept −21975 00487 −22930 −21020 −4510


80 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

In the class C 1 ∪ C 2 , the expected claim number equals<br />

m = p 1 m 1 + p 2 m 2<br />

where p 1 and p 2 denote the respective weights <strong>of</strong> C 1 and C 2 (the ratio <strong>of</strong> the class exposure<br />

to the sum <strong>of</strong> the exposures <strong>of</strong> both classes, say). Considering the variance <strong>of</strong> the number <strong>of</strong><br />

claims in C 1 ∪ C 2 , it can be decomposed as the average <strong>of</strong> the conditional variances 2 1 and<br />

2 2 plus the variance <strong>of</strong> the conditional means m 1 and m 2 (that is, the weighted sum <strong>of</strong> their<br />

squared difference with respect to the grand mean m). The variance thus becomes<br />

p 1 2 1 + p 2 2 2<br />

+p<br />

} {{ } 1 m 1 − m 2 +p<br />

} {{ } 2 m 2 − m 2 >m<br />

} {{ }<br />

=m<br />

>0<br />

>0<br />

which exceeds the mean. Hence, omitting relevant ratemaking variables induces<br />

overdispersion.<br />

2.4.3 Consequences <strong>of</strong> Overdispersion<br />

As mentioned in Section 2.3.14, misspecification <strong>of</strong> the variance function does not affect the<br />

consistency <strong>of</strong> ̂, but leads to misspecification <strong>of</strong> the asymptotic variance-covariance matrix<br />

<strong>of</strong> ̂. As a result, we have a loss <strong>of</strong> efficiency.<br />

Overdispersion leads to underestimates <strong>of</strong> standard errors and overestimates <strong>of</strong> Chi-square<br />

statistics (as demonstrated in Table 2.3), which in turn may imply artificial statistical<br />

significance for the parameters. Consequently, some explanatory variables may become not<br />

significant after overdispersion has been accounted for. In practice, failing to account for<br />

overdispersion might produce too many risk classes in the portfolio.<br />

2.4.4 <strong>Modelling</strong> Overdispersion<br />

Many explanatory variables are unknown to the insurance company or cannot be incorporated<br />

in the price list (for legal, moral or economic reasons). There are thus unobservable<br />

characteristics Z i that may influence the number <strong>of</strong> claims filed by policyholder i as explained<br />

in Section 2.1.2. Of course, some Z ij s may be correlated with the observable characteristics<br />

X i . To remove these correlations, we could think <strong>of</strong> first regressing the Z i sontheX i s, with<br />

a linear regression model<br />

p∑<br />

Z ij = 0 + k X ik + ij <br />

Then, the score becomes<br />

0 +<br />

p∑<br />

j=1<br />

dimZ i <br />

∑<br />

j X ij + j Z ij = 0 +<br />

j=1<br />

for appropriate ˜ 0 , ˜ j s and ˜ i .<br />

= ˜ 0 +<br />

k=1<br />

p∑<br />

j=1<br />

dimZ i <br />

∑<br />

j X ij + j<br />

( 0 +<br />

p∑<br />

˜j X ij +˜ i<br />

j=1<br />

j=1<br />

p∑ )<br />

k X ik + ij<br />

k=1


<strong>Risk</strong> <strong>Classification</strong> 81<br />

The unobserved heterogeneity (when correlated to observable characteristics) thus modifies<br />

the regression coefficients: the true effect <strong>of</strong> X i on N i becomes an apparent effect ˜.<br />

Hence, the estimated jth regression coefficient does not only represent the effect <strong>of</strong> the<br />

jth covariate on the number <strong>of</strong> claims, but also accounts for the effect <strong>of</strong> all the hidden<br />

characteristics Z i correlated with the jth observable one. This is why the ̂ j s may strongly<br />

depend on which covariates are included in the model. Moreover, there remains an error<br />

term ˜ i representing the influence <strong>of</strong> the hidden variables on N i , corrected for the effect <strong>of</strong><br />

the observed risk factors X i .<br />

For these reasons, we now consider a mixed Poisson model<br />

( (<br />

))<br />

p∑<br />

N i ∼ oi exp 0 + j x ij + i i= 1 2n (2.10)<br />

j=1<br />

where the random variable i represents the residual effect <strong>of</strong> the hidden characteristics.<br />

Therefore, the heterogeneity is taken into account by assuming that the number <strong>of</strong> accidents<br />

is Poisson distributed with mean varying from one policyholder to another.<br />

Note that some hidden characteristics are correlated to those in X i (e.g. the hidden annual<br />

mileage and the observable use <strong>of</strong> the vehicle). The random variable i in (2.10) models the<br />

effect <strong>of</strong> hidden characteristics that is not already explained by X i . Since i accounts for a<br />

residual effect, we will consider in the remainder <strong>of</strong> this book that i is independent from<br />

X i . The price to pay is that the estimated regression coefficient j does not only express the<br />

effect <strong>of</strong> the jth regressor, but also the effect <strong>of</strong> all the hidden characteristics correlated with<br />

the jth regressor. This is important when the actuary tries to interpret the resulting price list.<br />

The policyholders have different accident proneness because <strong>of</strong> observable characteristics<br />

taken into account in the price list and hidden characteristics to be corrected a posteriori.<br />

The heterogeneity is taken into account by assuming that the number <strong>of</strong> accidents is Poisson<br />

distributed with mean varying from one policyholder to another. The annual claim frequency<br />

becomes a random variable i i where i = exp i models the oscillations around the grand<br />

mean i (with E i = 1). We can now write N i ∼ Poi i i where i = d i exp˜x T i .<br />

As in (1.29), we have<br />

VN i = i + 2 i V i> i = EN i (2.11)<br />

so that any mixed Poisson regression model induces overdispersion.<br />

2.4.5 Detecting Overdispersion<br />

Residual heterogeneity remains considerable in the risk classes despite the use <strong>of</strong> many a<br />

priori variables. Indeed, many explanatory variables are unknown to the insurance company<br />

and cannot be incorporated in the price list. Let us denote as ̂m k the empirical mean claim<br />

number <strong>of</strong> risk class k, and ̂<br />

k<br />

2 the associated variance. In order to graphically test the<br />

Poisson mixture assumption, we have plotted the points ̂m k ̂<br />

k 2 and also the bisecting line.<br />

The plot is displayed in Figure 2.9. We observe that ̂m k < ̂<br />

k<br />

2 in numerous risk classes,<br />

indicating that the homogeneous Poisson model is inappropriate. We also see that most <strong>of</strong><br />

the observed pairs ̂m k ̂<br />

k 2 lie above the bisecting line, thus supporting overdispersion. The<br />

three points below the 45-degree line correspond to classes with just a few policies (36, 13<br />

and 12, respectively).


82 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

0.8<br />

Overdispersion <strong>of</strong> the data<br />

Empirical variances<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

0.0 0.1 0.2 0.3 0.4<br />

Empirical means<br />

Figure 2.9 Mean–Variance pairs for the risk classes, Portfolio A.<br />

A quadratic curve (without intercept) has been fitted to the mean–variance couples by<br />

weighted least-squares (the weights being the exposures <strong>of</strong> the risk classes). The high value<br />

<strong>of</strong> the R 2 coefficient (86.41 %) supports the quality <strong>of</strong> the fit. This shows that equation (2.11)<br />

is supported by the data, and provides empirical evidence for a mixed Poisson model.<br />

2.4.6 Testing for Overdispersion<br />

The graphical test <strong>of</strong> the previous section is an easy way <strong>of</strong> detecting overdispersion. But<br />

many statistical tests for the overdispersion assumption have been developed in the literature.<br />

Testing for overdispersion can be done by testing for the Poisson distribution against a mixed<br />

Poisson model. One problem with standard specification tests (such as likelihood ratio tests)<br />

occurs when the null hypothesis is on the boundary <strong>of</strong> the parameter space, as explained in<br />

Chapter 1. When a parameter is bounded by the H 0 hypothesis, the estimate is also bounded<br />

and the asymptotic Normality <strong>of</strong> the maximum likelihood estimator no longer holds under<br />

H 0 . Consequently, a correction must be made.<br />

Alternatively, testing for overdispersion can be based on the variance function. According<br />

to (2.11), the variance function <strong>of</strong> a heterogeneity model is <strong>of</strong> the form:<br />

VN i = i + 2 i (2.12)<br />

with = V i being the variance <strong>of</strong> the random effect. Therefore, we have to test the null<br />

hypothesis H 0 : = 0 against H 1 : >0. The following score statistics can be used to test the<br />

Poisson distribution against heterogeneity models with a variance function <strong>of</strong> the form (2.12):<br />

T 1 =<br />

∑ n<br />

i=1<br />

(k i −̂ i 2 − k i<br />

)<br />

√<br />

2 ∑ n<br />

i=1̂ 2 i


<strong>Risk</strong> <strong>Classification</strong> 83<br />

T 2 =<br />

∑ n<br />

i=1<br />

√<br />

∑n<br />

i=1<br />

(k i −̂ i 2 − k i<br />

)<br />

(<br />

k i −̂ i 2 − k i<br />

) 2<br />

<br />

and<br />

T 3 = √<br />

1<br />

n<br />

∑ n<br />

i=1<br />

(k i −̂ i 2 − k i<br />

)<br />

∑<br />

(<br />

) 2<br />

√ <br />

n ∑n<br />

i=1̂ −2<br />

i k i −̂ i 2 − k i i=1̂ 2 i<br />

All these test statistics are or0 1 distributed. For Portfolio A, T 1 = 918, T 2 = 613 and<br />

T 3 = 438. All the p-values are less than 10 −4 leading to the rejection <strong>of</strong> the null hypothesis<br />

(and thus the Poisson model) in favour <strong>of</strong> the mixed Poisson model.<br />

2.5 Negative Binomial Regression Model<br />

2.5.1 Likelihood Equations<br />

Overdispersion is taken into account by the inclusion <strong>of</strong> a random effect, representing<br />

an unknown relative risk level. More precisely, assume that 1 n are independent<br />

ama a distributed random variables, i.e. the common probability density function<br />

<strong>of</strong> the i s is given by (1.35). In this case, E i = 1 and V i = 1/a.<br />

Note that the assumptions made about the i s are rather strong. Their common<br />

distribution means that the effect <strong>of</strong> hidden variables does not depend on observable<br />

ones.<br />

Conditional on the observable characteristics summarized in the vector x i and on the<br />

random effect i = , the annual claim number caused by policyholder i conforms to the<br />

oi i law. In other words, i is the expected claim frequency for policyholder i (based<br />

on x i ) and is the relative risk level for this policyholder (if


84 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The maximum likelihood estimator for and a solve<br />

(<br />

)<br />

<br />

n∑<br />

La= a + k<br />

˜x i k i − i<br />

i = 0<br />

a + i<br />

i=1<br />

Note that these equations are similar to the ones obtained in the Poisson case except that i<br />

is now replaced with i a + k i /a + i .<br />

Remark 2.4 Let us give an intuitive explanation for the ratio a + k i /a + i involved<br />

in the Negative Binomial likelihood equations. The joint probability density function <strong>of</strong><br />

N i i equals<br />

( ) ki<br />

i i 1<br />

exp− i i <br />

k i ! a aa a−1<br />

i<br />

exp−a i <br />

∝ exp − i i k i+a−1<br />

i exp−a i <br />

The probability density function <strong>of</strong> i given N i = k i is<br />

exp − i a + i a+k i−1<br />

i<br />

∫ +<br />

0<br />

exp − a + i a+k i−1<br />

d<br />

= exp − i a + i a+k a + <br />

i−1 i a+k i<br />

i<br />

a + k i <br />

so that i given N i = k i follows the am a + k i a+ i distribution. Therefore,<br />

E i N i = k i = a + k i<br />

a + i<br />

and the maximum likelihood estimators in the Negative Binomial regression model solve<br />

n∑<br />

i=1<br />

)<br />

x i<br />

(k i − i E i N i = k i = 0<br />

Compared to the Poisson likelihood equations, the predicted expected claim number i is<br />

replaced with its update i E i N i = k i based on the information contained in the number<br />

k i <strong>of</strong> claims filed by policyholder i.<br />

As already shown in Section 2.3.7, it is possible to solve the Negative Binomial likelihood<br />

equations with the help <strong>of</strong> the Newton–Raphson iterative procedure. Starting values for the<br />

Newton–Raphson iterative procedure are usually obtained as follows: The Poisson maximum<br />

likelihood estimator ̂ is known to be consistent, so that we keep it as a reasonable starting<br />

value. We still have to find an initial estimate for = V i = 1/a. So, we first compute<br />

the variance <strong>of</strong> N i which is given by<br />

VN i = EN i + ( d i expscore i ) 2


<strong>Risk</strong> <strong>Classification</strong> 85<br />

and then write the empirical analogue <strong>of</strong> the last relation<br />

n∑<br />

i=1<br />

( (ki<br />

− d i expscore i ) 2<br />

− ki − ( d i expscore i ) 2<br />

<br />

)<br />

= 0<br />

Therefore, the estimator <strong>of</strong> is given by<br />

1<br />

â =̂ = ∑ n<br />

i=1<br />

( (ki<br />

− d i expŝcore i ) 2<br />

− ki<br />

)<br />

∑ n<br />

i=1<br />

(<br />

di expŝcore i ) 2<br />

<br />

where ŝcore i =˜x T i ̂, ̂ being the Poisson maximum likelihood estimator for . The estimators<br />

̂ and ̂ are consistent in the Poisson mixture model, and are thus good starting values for<br />

finding the maximum likelihood estimators.<br />

2.5.2 Numerical Illustration<br />

The Negative Binomial regression with categorical variables can be performed with the<br />

SAS R /STAT procedure GENMOD which corrects the estimations for overdispersion. The<br />

final model for Portfolio A is shown in Table 2.4. The interpretation <strong>of</strong> the different columns<br />

is the same as in Section 2.3.15. Compared with the Poisson fit, we see that the estimated<br />

j s are very similar, but the standard errors are larger in the Negative Binomial case.<br />

The estimation <strong>of</strong> the parameter a by the method <strong>of</strong> moments is given by â = 12401<br />

whereas the maximum likelihood estimator is equal to â = 1065. The log-likelihood is equal<br />

to −54485. The variance-covariance matrix <strong>of</strong> the estimated regression coefficients and the<br />

dispersion parameter is<br />

̂̂<br />

=<br />

⎛<br />

⎞<br />

0002424 −0001537 −0001472 −0001042 −0001033 −0001537 0000033<br />

−0001537 0003249 0001692 −0000080 −0000286 0000824 0000022<br />

−0001472 0001692 0006073 −0000216 −0000537 0001077 0000319<br />

−0001042 −0000080 −0000216 0002859 0000217 −0000018 0000008<br />

<br />

⎜<br />

⎝ −0001033 −0000286 −0000537 0000217 0003038 0000688 0000200 ⎟<br />

⎠<br />

−0001537 0000824 0001077 −0000018 0000688 0006785 −0000019<br />

0000033 0000022 0000319 0000008 0000200 −0000019 0020850<br />

The Type 3 analysis gives the following results:<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender ∗ Age 2 6546


86 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 2.4 Results <strong>of</strong> the Negative Binomial regression for the final model, Portfolio A.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −21963 00492 −22928 −20998 199008


<strong>Risk</strong> <strong>Classification</strong> 87<br />

Table 2.5 Results <strong>of</strong> the Poisson Inverse-Gaussian regression, Portfolio A.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −21962 00494 −22930 −20995 19794


88 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

2.7.2 Numerical Illustration<br />

The GENMOD procedure <strong>of</strong> SAS R does not support the Poisson-LogNormal model. We<br />

have computed the regression coefficients with the aid <strong>of</strong> the NLMIXED procedure <strong>of</strong> SAS R .<br />

The results are given in Table 2.6 and the estimation <strong>of</strong> 2 is equal to 07064 which leads to<br />

̂ = ̂V i = 1027<br />

for the variance <strong>of</strong> the random effect. We observe that this value is greater than those obtained<br />

in the Poisson-Inverse Gaussian model and in the Negative Binomial model. The resulting<br />

regression coefficients are similar to those obtained previously. The log-likelihood is equal<br />

to −54481 and is intermediate between the log-likelihoods <strong>of</strong> the Negative Binomial model<br />

and <strong>of</strong> the Poisson-Inverse Gaussian model (which is the maximum). The variance-covariance<br />

matrix <strong>of</strong> the estimated regression coefficients and ̂ 2 is<br />

̂̂<br />

=<br />

⎛<br />

⎞<br />

0002438 −0001546 −0001474 −0001047 −0001036 −0001543 0000075<br />

−0001546 0003270 0001707 −0000082 −0000287 0000831 0000064<br />

−0001474 0001707 0006104 −0000224 −0000548 0001088 0000354<br />

−0001047 −0000082 −0000224 0002875 0000218 −0000022 −0000050<br />

<br />

⎜<br />

⎝ −0001036 −0000287 −0000548 0000218 0003056 0000687 0000188 ⎟<br />

⎠<br />

−0001543 0000831 0001088 −0000022 0000687 0006825 0000058<br />

0000075 0000064 0000354 −0000050 0000188 0000058 0008056<br />

The Type 3 analysis for the final model gives<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender ∗ Age 2 660


<strong>Risk</strong> <strong>Classification</strong> 89<br />

2.8 <strong>Risk</strong> <strong>Classification</strong> for Portfolio A<br />

So far, we have several competing models for the observed claim frequencies in Portfolio<br />

A. This section purposes to compare these models in order to select the optimal one.<br />

2.8.1 Comparison <strong>of</strong> Competing models with the Vuong Test<br />

Let us consider two non-nested competing models for the number <strong>of</strong> claims, with respective<br />

probability mass functions p·x and q·x , where x is a vector <strong>of</strong> explanatory<br />

variables, and and include the regression parameters as well as some dispersion<br />

coefficients . The corresponding log-likelihoods are<br />

L p =<br />

L q =<br />

n∑<br />

ln pk i x i <br />

i=1<br />

n∑<br />

ln qk i x i <br />

In this regression context, the test statistic proposed by Vuong (1989) is<br />

i=1<br />

T LRNN = L p̂ − L q ̂<br />

√ n<br />

(2.14)<br />

where<br />

2 = 1 n<br />

(<br />

n∑<br />

i=1<br />

ln pk ix i ̂<br />

qk i x i ̂<br />

) 2<br />

−<br />

(<br />

1<br />

n<br />

n∑<br />

i=1<br />

)<br />

ln pk 2<br />

ix i ̂<br />

(2.15)<br />

qk i x i ̂<br />

is the estimate <strong>of</strong> the variance <strong>of</strong> the log-likelihood difference. None <strong>of</strong> the model has to<br />

be true: the test is aimed at selecting the model that is the closer to the true conditional<br />

distribution. The null hypothesis <strong>of</strong> the test is that the two models are equivalent. Under the<br />

null hypothesis, the test statistic is asymptotically Normally distributed. Rejection in favour<br />

<strong>of</strong> p happens when T LRNN >c or in favour <strong>of</strong> q if T LRNN < −c, where c represents the<br />

or 0 1 critical value for some significance level. If T LRNN ≤c then the null hypothesis<br />

is not rejected and the Vuong test cannot discriminate between the two models, given the data.<br />

Now, comparing the three mixed Poisson models with the Vuong test gives:<br />

Negative Binomial against Poisson-Inverse Gaussian<br />

to −0754544 leading to a p-value <strong>of</strong> 4506 %.<br />

the value <strong>of</strong> the test statistic is equal<br />

Poisson-LogNormal against Negative Binomial<br />

to −0470894 leading to a p-value <strong>of</strong> 6378 %.<br />

the value <strong>of</strong> the test statistic is equal to<br />

Poisson-LogNormal against Poisson-Inverse Gaussian the value <strong>of</strong> the test statistic is<br />

equal to to −0216702 leading to a p-value <strong>of</strong> 8284 %.<br />

These results do not enable us to distinguish between the three models which are<br />

therefore statistically equivalent. The Negative Binomial model will be used for the numerical


90 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

illustrations involving Portfolio A in the next chapters. The reason is that, in this case, explicit<br />

expressions are available, providing a deeper insight in the mechanisms behind experience<br />

rating systems.<br />

2.8.2 Resulting <strong>Risk</strong> <strong>Classification</strong> for Portfolio A<br />

Table 2.7 gives the resulting price list obtained with the Negative Binomial model <strong>of</strong><br />

Table 2.4. A ‘Yes’ indicates the presence <strong>of</strong> the characteristic corresponding to the column.<br />

The final a priori ratemaking contains 23 classes. Table 2.7 gives the estimated expected<br />

annual claim frequencies obtained from the Negative Binomial regression model, and the<br />

relative importance <strong>of</strong> each risk class.<br />

Note that there is another way to present the results displayed in Table 2.7. The idea is to<br />

start from the annual expected claim frequency <strong>of</strong> the reference class, estimated to<br />

exp̂ 0 = 1112 %<br />

according to Table 2.4, and then to apply correction coefficients. Specifically, the annual<br />

expected claim frequency <strong>of</strong> a given policyholder is simply obtained from<br />

1112 %<br />

⎧<br />

exp06399 = 190 if the policyholder is a male aged between 18 and 24<br />

⎪⎨<br />

exp02363 = 127 if the policyholder is a female aged between 18 and 30<br />

×<br />

or a male aged between 25 and 30<br />

⎪⎩<br />

1<br />

otherwise<br />

{ exp−01805 = 083 if the policyholder lives in a rural district<br />

×<br />

1<br />

otherwise<br />

{ exp04783 = 161 if the policyholder splits the premium payment<br />

×<br />

1<br />

otherwise<br />

{ exp02145 = 124 if the policyholder uses the car for pr<strong>of</strong>essional purposes<br />

×<br />

1<br />

otherwise<br />

2.9 Ratemaking using Panel Data<br />

2.9.1 Longitudinal Data<br />

Actuaries <strong>of</strong>ten pool several observation periods to determine the price list (the main goal<br />

being to increase the size <strong>of</strong> the data base). The serial dependence arising from the fact that<br />

the same individuals are followed and produce correlated claim numbers should prevent the<br />

actuaries from using classical statistical techniques (which assume independence).<br />

During the observation period, n policies have been in the portfolio, each one observed<br />

during T i periods. Let N it be the number <strong>of</strong> claims reported by policyholder i during year t,<br />

i = 1 2n, t = 1 2T i . Such motor insurance data have a panel structure: typically,<br />

n is large whereas the T i s are small.<br />

Let d it be the length <strong>of</strong> observation period t for policyholder i. Usually, d it = 1, but there<br />

are a variety <strong>of</strong> situations where this is not the case. Indeed, a new period <strong>of</strong> observation


Table 2.7 A priori risk classification for Portfolio A (Negative Binomial regression).<br />

Female 18–30<br />

Male 25–30<br />

Gender–age Use <strong>of</strong> the car Premium split District Exp. annual<br />

claim freq. (%)<br />

Male 18–24 Others Private Pr<strong>of</strong>essional Annual Split Rural Urban<br />

Weights<br />

(%)<br />

Yes No No Yes No Yes No Yes No 1176 1049<br />

Yes No No Yes No Yes No No Yes 1408 1396<br />

Yes No No Yes No No Yes Yes No 1897 398<br />

Yes No No Yes No No Yes No Yes 2272 705<br />

Yes No No No Yes Yes No Yes No 1457 076<br />

Yes No No No Yes Yes No No Yes 1746 122<br />

Yes No No No Yes No Yes Yes No 2351 013<br />

Yes No No No Yes No Yes No Yes 2816 014<br />

No Yes No Yes No Yes No Yes No 1761 293<br />

No Yes No Yes No Yes No No Yes 2109 299<br />

No Yes No Yes No No Yes Yes No 2840 152<br />

No Yes No Yes No No Yes No Yes 3402 242<br />

No Yes No No Yes Yes No Yes No 2182 007<br />

No Yes No No Yes Yes No No Yes 2614 009<br />

No Yes No No Yes No Yes Yes No 3520 002<br />

No No Yes Yes No Yes No Yes No 928 1338<br />

No No Yes Yes No Yes No No Yes 1112 1973<br />

No No Yes Yes No No Yes Yes No 1498 294<br />

No No Yes Yes No No Yes No Yes 1794 661<br />

No No Yes No Yes Yes No Yes No 1151 372<br />

No No Yes No Yes Yes No No Yes 1378 517<br />

No No Yes No Yes No Yes Yes No 1856 025<br />

No No Yes No Yes No Yes No Yes 2223 044


92 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

starts as soon as some policy characteristics are modified (think for instance <strong>of</strong> a policyholder<br />

house moving for a company using postcode as rating factor, a policyholder’s wedding for<br />

a company using marital status, or simply the policyholder buying a new car). Moreover,<br />

in the year the policy is issued and in the one it is possibly cancelled the length <strong>of</strong> the<br />

observation period is generally less than unity.<br />

We face a nested structure: each policyholder generates a sequence N i =<br />

N i1 N i2 N iTi T <strong>of</strong> claim numbers. It is reasonable to assume independence between the<br />

series N 1 N 2 N n , but this assumption is very questionable inside the N i s. Regarding a<br />

priori ratemaking, the dependence between the components <strong>of</strong> each N i is a nuisance (in the<br />

statistical sense). This means that, at this stage, we are not interested in accurately modelling<br />

this dependence, but we must take it into account when estimating the regression coefficients.<br />

The idea now is to incorporate in the N it s exogenous information (like age, gender, power<br />

<strong>of</strong> the car, and so on) summarized in the vectors x it ; to this end, we resort to a regression<br />

model for longitudinal data.<br />

The distributional assumption for the random component <strong>of</strong> the regression model has to<br />

account for the non-negativity <strong>of</strong> the data, as well as their integer values. We begin with<br />

Poisson regression and assume that the N it s conform to the Poisson distribution with a<br />

mean that can be written as an exponential function <strong>of</strong> a linear combination 0 + ∑ p<br />

j=1 jx itj<br />

<strong>of</strong> the explanatory variables x it , with unknown regression coefficients to be estimated<br />

from the data. Despite its prevalence as a starting point in the analysis <strong>of</strong> count data, the<br />

Poisson specification is <strong>of</strong>ten inappropriate because <strong>of</strong> unobserved heterogeneity and failure<br />

<strong>of</strong> the independence assumption if the data consist in repeated observations on the same<br />

policyholders. A convenient way to take this phenomenon into account is to introduce a<br />

random effect into the model.<br />

Remark 2.5 Before embarking on a panel analysis pooling together the observations<br />

relating to several years, it is interesting to first work year by year to assess the stability <strong>of</strong><br />

the effect <strong>of</strong> each rating variable on the annual expected claim frequency. Specifically, the<br />

vector <strong>of</strong> the regression coefficients is estimated on the basis <strong>of</strong> each calendar year and the<br />

components are checked for their stability over time. Only stable coefficients are interesting<br />

for the purpose <strong>of</strong> ratemaking. Rating factors with unstable regression coefficients should be<br />

excluded from the risk classification scheme. In some cases, a time trend is visible for some<br />

estimated regression coefficients (this is typically true for the intercept 0 ). A time effect<br />

can then be incorporated into the model to account for coefficients with trends.<br />

2.9.2 Descriptive Statistics for Portfolio B<br />

The analysis in this section is based on an insurance portfolio containing 20 354 policies and<br />

observed during 3 years (from 1997 to 1999). We have 45 350 observations available as not<br />

all the policies have been in force for 3 years. For each policy and for each year, we know<br />

the exposure-to-risk, the number <strong>of</strong> claims filed and some other explanatory variables:<br />

Gender: Policyholder’s gender (male–female)<br />

Age: Policyholder’s age (18–22, 23–30 and over 30)<br />

Power: The power <strong>of</strong> the vehicle (less than 66 kW, 66–110 kW and more than 110 kW)<br />

Size: The size <strong>of</strong> the city where the policyholder lives (large, middle or small), and<br />

Colour: The colour <strong>of</strong> the vehicle (red or other).


<strong>Risk</strong> <strong>Classification</strong> 93<br />

Frequency<br />

0.36<br />

0.34<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

0–3 3–6 6–9 9–12 12–15 15–18 18–21 21–24 24–27 27–30 30–33 33–36<br />

Months<br />

Figure 2.10 Exposure-to-risk in Portfolio B.<br />

The observed mean annual claim frequency is 184 %. Figures 2.11 to 2.15 display the<br />

histograms giving, for each explanatory variable, the distribution <strong>of</strong> the portfolio between<br />

the different levels <strong>of</strong> the variable and, for each <strong>of</strong> these levels, the observed mean annual<br />

claim frequency.<br />

Figure 2.10 gives the distribution <strong>of</strong> the exposure-to-risk in the portfolio. About 34 % <strong>of</strong><br />

the policies have been in force during 3 years. The exposures for policies in force for less<br />

than three years are roughly uniformly distributed accross the triennium.<br />

The age structure <strong>of</strong> the portfolio is described in Figure 2.11. Most policyholders (60.6 %<br />

<strong>of</strong> the portfolio) are older than 30. Only 3.0 % <strong>of</strong> the policyholders are less than 22 whereas<br />

36.4 % are between 23 and 30. We see that the annual claim frequency decreases with<br />

the age <strong>of</strong> the policyholders. In this portfolio, the young drivers are rather risky as their<br />

observed annual claim frequency is 30.8 %. Drivers over 30 have an observed annual claim<br />

frequency <strong>of</strong> 16.3 %. Finally, the drivers aged between 23 and 30 have an observed annual<br />

claim frequency <strong>of</strong> 20.8 %.<br />

Figure 2.12 suggests a higher annual claim frequency for males (18.8 %) than for females<br />

(17.7 %). The portfolio is comprised <strong>of</strong> 64.0 % male policyholders and 36.0 % female<br />

policyholders.<br />

Figure 2.13 gives the distribution <strong>of</strong> the policyholders according to the size <strong>of</strong> the city<br />

where they live. We see that 31.6 % <strong>of</strong> the policyholders live in a large city, 35.0 % in a<br />

middle-sized city and the remaining 33.4 % in a small city. The annual claim frequency<br />

seems to increase with the size <strong>of</strong> the city (going from 16.6 % for the small cities, to 17.8 %<br />

for the middle-sized cities and to 21.0 % for the large cities).<br />

We see, in Figure 2.14, that most <strong>of</strong> the cars (65.6 % <strong>of</strong> the portfolio) have a power smaller<br />

than 66 kW, 31.2 % have a power included in the 66–110 kW range and only 3.2 % <strong>of</strong> the<br />

cars hold an engine with a power greater than 110 kW. We notice that the most powerful<br />

cars are the least risky (with an annual claim frequency <strong>of</strong> 16.6 %) whereas the cars with an<br />

engine power range between 66 kW and 110 kW are the riskiest (with an observed annual<br />

claim frequency <strong>of</strong> 18.7 %). Finally, the least powerful cars have an annual claim frequency<br />

<strong>of</strong> 18.3 %.


94 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Number <strong>of</strong> policyholders<br />

30000<br />

20000<br />

10000<br />

0<br />

16507<br />

1361<br />

18–22 23–30<br />

Age<br />

0.32 0.308<br />

27482<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

Annual claim frequency<br />

0.22 0.208<br />

0.20<br />

0.18<br />

0.163<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

>30<br />

0.00<br />

18–22 23–30 >30<br />

Age<br />

Figure 2.11 Composition <strong>of</strong> Portfolio B with respect to Age (left panel) and observed annual claim<br />

frequencies according to Age (right panel).<br />

Figure 2.15 shows that the colour <strong>of</strong> the car has nearly no influence on the number <strong>of</strong><br />

claims. We can notice that 10.1 % <strong>of</strong> the cars are red.<br />

2.9.3 Poisson Regression with Serial Independence<br />

In this subsection, we assume that the N it s are independent for the different values <strong>of</strong><br />

i and t. With the help <strong>of</strong> the SAS R /STAT procedure GENMOD, we have obtained the<br />

results displayed in Table 2.8 for the model with the 5 explanatory variables presented in<br />

Subsection 2.9.2. The results obtained from a Type 3 analysis are as follows:<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 485 00276<br />

Age 2 17356


<strong>Risk</strong> <strong>Classification</strong> 95<br />

Number <strong>of</strong> policyholders<br />

30000<br />

20000<br />

10000<br />

0<br />

29024<br />

0.19<br />

0.188<br />

0.18<br />

0.177<br />

0.17<br />

0.16<br />

0.15<br />

0.14<br />

0.13<br />

0.12<br />

16326<br />

0.11<br />

0.10<br />

0.09<br />

0.08<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

Female Male<br />

0.00<br />

Female Male<br />

Gender<br />

Annual claim frequency<br />

Gender<br />

Figure 2.12 Composition <strong>of</strong> Portfolio B with respect to Gender (left panel) and observed annual<br />

claim frequencies according to Gender (right panel).<br />

The variable Colour is not relevant, as the p-value is 56.98 % according to the Type 3<br />

analysis. The variable Power could then also be removed (p-value <strong>of</strong> 11.21 % after having<br />

removed the variable Colour from the regression model) but since this variable is common<br />

in the tariff <strong>of</strong> insurers, we will try an interaction between the variables Age and Power.<br />

The interaction between Age and Power is illustrated in Figure 2.16 (where ‘LP’ stands<br />

for large power, that is, over 110 kW, ‘MP’ stands for medium power, that is, between 66<br />

and 110 kW, and ‘SP’ for small power, that is, less than 66 kW). The results are given in<br />

Table 2.9. The Type 3 analysis is as follows:<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 486 0.0275<br />

Age ∗ Power 8 18001


96 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Number <strong>of</strong> policyholders<br />

16000<br />

15000<br />

14000<br />

13000<br />

12000<br />

11000<br />

10000<br />

9000<br />

8000<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

14331<br />

15872<br />

0.210<br />

0.21<br />

15147<br />

0.20<br />

0.19<br />

0.178<br />

0.18<br />

0.17<br />

0.166<br />

0.16<br />

0.15<br />

0.14<br />

0.13<br />

0.12<br />

0.11<br />

0.10<br />

0.09<br />

0.08<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

Large Middle Small<br />

0.00<br />

Large Middle Small<br />

Size <strong>of</strong> city<br />

Size <strong>of</strong> city<br />

Annual claim frequency<br />

Figure 2.13 Composition <strong>of</strong> Portfolio B with respect to Size <strong>of</strong> the city (left panel) and observed<br />

annual claim frequencies according to Size <strong>of</strong> the city (right panel).<br />

Nevertheless, several levels <strong>of</strong> the interaction between Age and Power can be grouped<br />

together. As already explained in Section 2.3.15, the option ‘Estimate’ <strong>of</strong> GENMOD can be<br />

used to assess the relevance <strong>of</strong> grouping the levels <strong>of</strong> the interaction Age–Power 2 by 2. The<br />

following groups have been defined:<br />

Group 1 Age > 30 and any power<br />

Group 2 Age 23–30 and any power as well as Age 18–22 and power > 110 kW<br />

Group 3 Age 18–22 and power < 110 kW<br />

The final model is shown in Table 2.10. The variance-covariance matrix <strong>of</strong> the estimated<br />

regression coefficients is<br />

̂̂<br />

=<br />

⎛<br />

⎞<br />

0003428 −0000244 −0003022 −0003028 −0000441 −0000468<br />

−0000244 0000679 0000012 0000006 −0000004 0000007<br />

−0003022 0000012 0003350 0003064 −0000084 −0000050<br />

⎜ −0003028 0000006 0003064 0003436 −0000059 −0000045<br />

<br />

⎟<br />

⎝ −0000441 −0000004 −0000084 −0000059 0000938 0000511 ⎠<br />

−0000468 0000007 −0000050 −0000045 0000511 0000964


<strong>Risk</strong> <strong>Classification</strong> 97<br />

Number <strong>of</strong> policyholders<br />

30000<br />

29750<br />

20000<br />

14149<br />

10000<br />

1451<br />

Annual claim frequency<br />

0.19 0.187<br />

0.183<br />

0.18<br />

0.17<br />

0.156<br />

0.16<br />

0.15<br />

0.14<br />

0.13<br />

0.12<br />

0.11<br />

0.10<br />

0.09<br />

0.08<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

0<br />

0.00<br />

66–110 kw 110 kw 66–110 kw 110 kw<br />

Power <strong>of</strong> car<br />

Power <strong>of</strong> car<br />

Figure 2.14 Composition <strong>of</strong> Portfolio B with respect to Power (left panel) and observed annual claim<br />

frequencies according to Power (right panel).<br />

Results <strong>of</strong> the Type 3 analysis are as follows:<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 650 00108<br />

Age ∗ Power 2 17314


98 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Number <strong>of</strong> policyholders<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

0<br />

0.19 0.184<br />

0.180<br />

0.18<br />

0.17<br />

40770<br />

0.16<br />

0.15<br />

0.14<br />

0.13<br />

0.12<br />

0.11<br />

0.10<br />

0.09<br />

0.08<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

4580<br />

0.02<br />

0.01<br />

0.00<br />

Other Red Other Red<br />

Colour <strong>of</strong> car<br />

Colour <strong>of</strong> car<br />

Annual claims frequency<br />

Figure 2.15 Composition <strong>of</strong> Portfolio B with respect to Colour <strong>of</strong> the car (left panel) and observed<br />

annual claim frequencies according to Colour <strong>of</strong> the car (right panel).<br />

Table 2.8 Results <strong>of</strong> the Poisson regression for the model with 5 variables and serial independence,<br />

Portfolio B.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −19242 00302 −19833 −18650 406354


<strong>Risk</strong> <strong>Classification</strong> 99<br />

Number <strong>of</strong> policyholders<br />

17000<br />

16000<br />

15000<br />

14000<br />

13000<br />

12000<br />

11000<br />

10000<br />

9000<br />

8000<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

LP LP LP MP MP MP SP SP<br />

18–22 23–30 >30 18–22 23–30 >30 18–22 23–30<br />

Interaction Age–Power<br />

SP<br />

>30<br />

Annual claim frequency<br />

0.36<br />

0.34<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

0.000<br />

LP<br />

18–22<br />

0.253<br />

LP<br />

23–30<br />

0.158<br />

0.341<br />

0.218<br />

0.171<br />

0.302<br />

0.204<br />

LP MP MP MP SP SP<br />

>30 18–22 23–30 >30 18–22 23–30<br />

Interaction Age–Power<br />

0.160<br />

SP<br />

>30<br />

Figure 2.16 Composition <strong>of</strong> Portfolio B with respect to the Age–Power interaction (left panel) and<br />

annual claim frequencies according to the Age–Power interaction (right panel).<br />

Table 2.9 Results <strong>of</strong> the Poisson regression for model with interactions between Age and Power,<br />

Portfolio B.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −19248 00312 −19859 −18637 381603 −166022 6143464 −120576 1202437 000 09978<br />

110 kW<br />

Age ∗ Power 17–22<br />

07596 01333 04983 10208 3247 30<br />

00547 00357 −00152 01246 235 01250<br />

66–110 kW<br />

Age ∗ Power > 30 < 66 kW 0 0 0 0 <br />

Size <strong>of</strong> the city Large 02556 00306 01956 03156 6962


100 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 2.10 Results <strong>of</strong> the Poisson regression for the final model, Portfolio B.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −12473 00585 −13620 −11325 45383


<strong>Risk</strong> <strong>Classification</strong> 101<br />

These results are then corrected with a multiplying factor computed thanks to a Poisson<br />

regression with ‘number <strong>of</strong> claims <strong>of</strong> the previous year’ as the single explanatory variable.<br />

We use the sum <strong>of</strong> the logarithm <strong>of</strong> the estimated annual expected claim frequency obtained<br />

when assuming serial independence and <strong>of</strong> the logarithm <strong>of</strong> the exposure-to-risk as an <strong>of</strong>fset.<br />

Specifically, the expected number <strong>of</strong> claims for policyholder i during period t, t = 2 3, is<br />

now <strong>of</strong> the form<br />

d it exp<br />

(<br />

̂0 +<br />

)<br />

p∑<br />

̂j x itj exp˜ 0 +˜ 1 N it−1 <br />

j=1<br />

where the ̂ j s are those <strong>of</strong> Table 2.10 and where the parameters ˜ 0 and ˜ 1 have to be<br />

estimated by Poisson regression. The <strong>of</strong>fset is<br />

ln d it +̂ 0 +<br />

p∑<br />

̂j x itj <br />

Results <strong>of</strong> the Poisson regression on the model incorporating the past claims are as follows:<br />

j=1<br />

Variable Coeff ˜ Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −00412 00180 −0.0766 −00059 523 00222<br />

N t−1 03241 00370 02517 03966 7688 Chi-sq<br />

N t−1 1 68.66


102 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The variance-covariance matrix <strong>of</strong> the N it s in the Poisson model with serial independence is<br />

given by<br />

⎛<br />

⎞<br />

i1 0 ··· 0<br />

0 i2 ··· 0<br />

A i = ⎜<br />

⎝<br />

<br />

<br />

⎟<br />

⎠ <br />

0 0 ··· iTi<br />

In fact this matrix does not take the overdispersion and the serial dependence <strong>of</strong> the data<br />

into account. Noting that<br />

<br />

EN i = A i X i<br />

it is possible to transform (2.16) in order to let A i explicitly appear in the likelihood equations.<br />

This gives<br />

n∑<br />

i=1<br />

( ) <br />

T<br />

EN i A ( −1<br />

i n i − EN i ) = 0 (2.17)<br />

The principle <strong>of</strong> Generalized Estimating Equations (GEE) is to find a suitable variancecovariance<br />

matrix to insert in Equation (2.17) instead <strong>of</strong> A i based on serial independence.<br />

This matrix should take the overdispersion and the serial dependence into account. A possible<br />

form <strong>of</strong> this matrix could be<br />

V i = A 1/2<br />

i R i A i<br />

1/2<br />

where the ‘working’ correlation matrix R i takes the serial dependence between the<br />

components <strong>of</strong> N i into account and depends on a parameter . The overdispersion is also<br />

taken into account as VN it = it exceeds EN it = it provided >1.<br />

The idea behind the GEE method is then to replace A i by V i in (2.17) and to compute the<br />

estimator <strong>of</strong> as the solution <strong>of</strong><br />

n∑<br />

i=1<br />

( ) <br />

T<br />

EN i V ( −1<br />

i n i − EN i ) = 0 (2.18)<br />

The resulting estimator is consistent whatever the choice <strong>of</strong> the matrix R i but the precision<br />

will be much better if R i is close to the true correlation matrix <strong>of</strong> N i . Equation (2.18)<br />

is solved thanks to a modified version <strong>of</strong> the Fisher scoring method for and a moment<br />

estimation for and . The iterative procedure is as follows:<br />

1. Compute an initial estimate <strong>of</strong> assuming independence.<br />

2. Compute the current ‘working’ correlation matrix based on standardized residuals, current<br />

and the assumed structure <strong>of</strong> R i .<br />

3. Estimate the covariance matrix V i .<br />

4. Update .<br />

Note that GEE is not a likelihood based method <strong>of</strong> estimation, so that inferences based on<br />

likelihoods are not possible in this case.


<strong>Risk</strong> <strong>Classification</strong> 103<br />

<strong>Modelling</strong> Dependence with the ‘Working Correlation Matrix’<br />

The ‘working’ correlation matrix R i takes into account the dependence between the<br />

observations corresponding to the same policyholder. The form <strong>of</strong> this matrix must be<br />

specified and depends on the parameter vector .<br />

If R i = I, (2.18) gives exactely the likelihood equations (2.16) under the assumption<br />

<strong>of</strong> independence. The SAS R /STAT procedure GENMOD supports the following structures<br />

<strong>of</strong> the ‘working’ correlation matrix: fixed (user-specific correlation matrix, not estimated<br />

from the data but specified by the actuary), independent (R i = I, giving the estimates<br />

<strong>of</strong> obtained under serial dependence), m-dependent (correlation equal to t for lags<br />

t = 1m, and 0 for higher lags), exchangeable (constant correlation , whatever the<br />

lag), unstructured (each correlation coefficient jk , between observations made at times j<br />

and k, is estimated from the data), and AR1 (autoregressive <strong>of</strong> order 1, with a correlation<br />

coefficient equal to t at lag t).<br />

Numerical Example<br />

The GEE approach can be performed thanks to the procedure GENMOD <strong>of</strong> SAS R . The<br />

‘Repeated’ statement <strong>of</strong> GENMOD invokes the GEE method, specifies the correlation structure,<br />

and controls the displayed output from the GEE model. Initial parameter estimates for iterative<br />

fitting <strong>of</strong> the GEE model are computed as in a Poisson regression model, as described<br />

previously. Results <strong>of</strong> the initial model fit are displayed as part <strong>of</strong> the generated output<br />

<strong>of</strong> SAS R . Statistics for the initial model fit such as parameter estimates, standard errors,<br />

deviances, and Pearson Chi-squares do not apply to the GEE model, and are only valid<br />

for the initial model fit. The SAS R parameter estimates table contains parameter estimates,<br />

standard errors, confidence intervals, Z-scores, and p-values for the parameter estimates.<br />

The ‘Repeated’ statement specifies the covariance structure <strong>of</strong> multivariate responses for<br />

GEE model fitting in the GENMOD procedure. In addition, the ‘Repeated’ statement controls<br />

the iterative fitting algorithm used in GEE and specifies optional output. Other GENMOD<br />

procedure statements are used in the same way as they are for ordinary Poisson regression<br />

models to specify the regression model for the mean <strong>of</strong> the responses.<br />

The statement ‘SUBJECT = subject-effect’ identifies subjects in the input data set. The<br />

subject-effect can be a single variable, an interaction effect, a nested effect, or a combination.<br />

Each distinct value, or level, <strong>of</strong> the effect identifies a different subject, or cluster. Responses<br />

from different subjects are assumed to be independent, and responses within subjects are<br />

assumed to be correlated. A subject-effect must be specified, and variables used in defining<br />

the subject-effect must be listed in the CLASS statement. In actuarial applications, the policy<br />

number is typically used as subject-effect.<br />

The same variables as for the model where the serial independence was assumed are kept<br />

in the final model. The results are given in Table 2.12 that is similar to the SAS R outputs<br />

for GEEs. The estimation <strong>of</strong> the ‘working correlation matrix’ gives<br />

⎛<br />

⎝<br />

1 00493 00460<br />

00493 1 00493<br />

00460 00493 1<br />

and ̂ = 13419.<br />

Since the GEE apprach is not based on a likelihood function, we cannot use the large<br />

sample approximations for the estimated variance-covariance matrix ̂̂ <strong>of</strong> the estimated<br />

⎞<br />


104 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

regression coefficients presented in Chapter 1. The model-based estimation <strong>of</strong> ̂<br />

in the<br />

GEE case is given by ̂<br />

= −1<br />

GEE where<br />

GEE =<br />

n∑<br />

i=1<br />

( ) <br />

T ( )<br />

EN −1 <br />

i V i<br />

EN i <br />

Here, −1<br />

GEE<br />

is the GEE-equivalent <strong>of</strong> the inverse <strong>of</strong> the Fisher information matrix. It is a<br />

consistent estimator <strong>of</strong> the covariance matrix <strong>of</strong> ̂ if the mean structure and the ‘working’<br />

correlation matrix are correctly specified. Then,<br />

̂<br />

= −1<br />

GEE GEE −1<br />

GEE<br />

is the robust estimate <strong>of</strong> the covariance matrix <strong>of</strong> ̂, where<br />

GEE =<br />

n∑<br />

i=1<br />

( ) <br />

T ( )<br />

EN i V −1 −1 <br />

i CN i V i<br />

EN i <br />

The robust estimate for ̂<br />

is consistent even if the ‘working’ correlation matrix is<br />

misspecified. In computing, the covariance matrix CN i <strong>of</strong> N i is replaced with<br />

ĈN i = N i − ÊN iN i − ÊN i T <br />

The robust estimated variance-covariance matrix <strong>of</strong> the estimated regression coefficients is<br />

̂̂<br />

=<br />

⎛<br />

⎜<br />

⎝<br />

0003910 −0000359 −0003418 −0003393 −0000509 −0000559<br />

−0000359 0000801 0000138 0000081 −0000054 −0000015<br />

−0003418 0000138 0003755 0003395 −0000089 −0000034<br />

−0003393 0000081 0003395 0003806 −0000051 −0000029<br />

−0000509 −0000054 −0000089 −0000051 0001100 0000595<br />

−0000559 −0000015 −0000034 −0000029 0000595 0001132<br />

If we compare Tables 2.10 (where the serial independence was assumed) and 2.12 (where<br />

the serial dependence is taken into account), we see that the standard errors are systematically<br />

larger in the GEE approach as serial dependence induces overdispersion.<br />

Generalized score tests for Type III contrasts are computed for GEE models (Wald tests<br />

are also available). Results <strong>of</strong> the Type 3 analysis are as follows:<br />

⎞<br />

<br />

⎟<br />

⎠<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 575 00165<br />

Age ∗ Power 2 12578


<strong>Risk</strong> <strong>Classification</strong> 105<br />

Table 2.12 Results <strong>of</strong> the Poisson regression with a GEE approach, Portfolio B.<br />

Variable Level Coeff Std error 95 % conf limit Z Pr > Z<br />

Intercept −12506 00625 −13731 −11280 −2000


106 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 2.13 Results <strong>of</strong> the Negative Binomial regression model with panel data, Portfolio B.<br />

Variable Level Coeff Std error 95 % conf limit t Pr > t<br />

Intercept −12277 00646 −13542 −11011 −1901


<strong>Risk</strong> <strong>Classification</strong> 107<br />

PrN i1 = k i1 N iTi = k iTi <br />

( )<br />

∫ + Ti<br />

(<br />

∏<br />

1<br />

= PrN it = k it i = √ exp − 1 )<br />

0<br />

2<br />

3 2 − 12 d<br />

t=1<br />

Again, numerical methods are needed to obtain the solutions and the SAS R /STAT procedure<br />

NLMIXED can be used to maximize the log-likelihood.<br />

The fit <strong>of</strong> the Poisson-Inverse Gaussian model is described in Table 2.14. We get<br />

̂ = 05575. The variance-covariance matrix <strong>of</strong> the estimated regression coefficients and<br />

dispersion parameter ̂ is<br />

⎛<br />

⎞<br />

0004216 −0000289 −0003723 −0003709 −0000534 −0000567 0000115<br />

−0000289 0000830 −0000002 −0000004 0000000 0000009 −0000002<br />

−0003723 −0000002 0004118 0003755 −0000097 −0000058 −0000027<br />

̂̂<br />

=<br />

−0003709 −0000004 0003755 0004193 −0000069 −0000052 0000011<br />

<br />

⎜ −0000534 0000000 −0000097 −0000069 0001152 0000616 0000030 ⎟<br />

⎝ −0000567 0000009 −0000058 −0000052 0000616 0001169 −0000010 ⎠<br />

0000115 −0000002 −0000027 0000011 0000030 −0000010 0002492<br />

The log-likelihood is equal to −19 6437 and Type 3 analysis gives<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 54 00201<br />

Age ∗ Power 2 1488


108 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

PrN i1 = k i1 N iTi = k iTi <br />

( )<br />

∫ + ∏ Ti<br />

= exp− it it k it<br />

1<br />

(<br />

0<br />

k it ! √ 2 exp − ln + 2 /2 2 )<br />

d<br />

2 2<br />

t=1<br />

The integral has no closed-form solution so that it is not possible to derive the log-likelihood<br />

equations. Therefore, numerical procedures are needed to solve the integral and to find<br />

maximum likelihood estimates (the NLMIXED procedure <strong>of</strong> SAS R /STAT, for instance).<br />

Now,<br />

EN it = it and VN it = it + ( exp 2 − 1 ) it 2 <br />

The fit <strong>of</strong> the Poisson-LogNormal model is described in Table 2.15. We get ̂ 2 = 04581.<br />

The variance-covariance matrix <strong>of</strong> the estimated regression coefficients and dispersion<br />

parameter ̂ 2 is<br />

̂̂<br />

=<br />

⎛<br />

⎞<br />

0004225 −0000291 −0003729 −0003714 −0000536 −0000569 0000091<br />

−0000291 0000832 −0000001 −0000004 −0000001 0000009 −0000003<br />

−0003729 −0000001 0004125 0003760 −0000097 −0000058 −0000022<br />

−0003714 −0000004 0003760 0004199 −0000069 −0000052 0000006<br />

<br />

⎜ −0000536 −0000001 −0000097 −0000069 0001155 0000618 0000022<br />

⎟<br />

⎝ −0000569 0000009 −0000058 −0000052 0000618 0001173 −0000007 ⎠<br />

0000091 −0000003 −0000022 0000006 0000022 −0000007 0001193<br />

The log-likelihood is −19 6423 and Type 3 analysis gives<br />

Source DF Chi-square Pr>Chi-sq<br />

Gender 1 52 00226<br />

Age ∗ Power 2 1488


<strong>Risk</strong> <strong>Classification</strong> 109<br />

2.9.9 Vuong Test<br />

In the framework <strong>of</strong> panel data, the test proposed in Section 2.8.1 is no longer valid. This<br />

test can be modified by using the conditional log-likelihood defined, for policyholder i and<br />

for the Negative Binomial model, as<br />

l NB<br />

i<br />

k it k i1 k it−1 = ln PrN it = k it N i1 = k i1 N it−1 = k it−1 <br />

PrN<br />

= ln i1 = k i1 N it = k it <br />

PrN i1 = k i1 N it−1 = k it−1 <br />

( ∫ ∏<br />

1 t<br />

0 s=1<br />

= ln<br />

exp− )<br />

is is k is f d<br />

∫<br />

k it ! ∏ t−1<br />

s=1 exp− <br />

is is k is f d<br />

0<br />

where f is given by (1.35). Therefore, for panel data, the log-likelihood can be equivalently<br />

rewritten as<br />

L =<br />

T n∑ ∑ i<br />

l NB<br />

i<br />

k it k i1 k it−1 <br />

i=1 t=1<br />

where for t = 1, li<br />

NB is simply the probability PrN i1 = k i1 . The quantities li<br />

PIG and li<br />

PLN are<br />

defined equivalently for i s following the Inverse Gaussian and LogNormal distributions,<br />

substituting for f the probability density functions (1.41) and (1.45), respectively.<br />

Golden (2003) extended the test proposed by Vuong (1989) to the serial case. It can<br />

be applied here to compare the different mixed Poisson alternatives provided the panel is<br />

balanced (that is, the T i s are equal for all the policyholders). In the application to Portfolio<br />

B, we keep the policies in the portfolio for three periods (that is, those for which T i = T = 3).<br />

There are 9894 such policies out <strong>of</strong> a total <strong>of</strong> 20 354 policies in the complete portfolio. The<br />

test statistic for comparing the Negative Binomial model to the Poisson-Inverse Gaussian<br />

model is then<br />

T ext =<br />

∑ n ∑ T<br />

i=1 t=1<br />

(<br />

l NB<br />

i<br />

)<br />

k it k i1 k it−1 − li<br />

PIG k it k i1 k it−1 <br />

√ <br />

nT<br />

with the variance defined as<br />

n∑<br />

2 = 1 n<br />

(<br />

×<br />

T∑<br />

T∑ (<br />

i=1 t=1 s=1<br />

l NB<br />

i<br />

l NB<br />

i<br />

)<br />

k it k i1 k it−1 − l PIG<br />

i<br />

k it k i1 k it−1 <br />

k is k i1 k is−1 − l PIG<br />

i<br />

k is k i1 k is−1 <br />

This expression for the variance is close to the one obtained in the independent case, except<br />

that covariances between correlated observations appear here. Note that several assumptions<br />

have to be checked before the test can be formally performed. We refer the reader to Golden<br />

(2003) for more details. Here, we proceed as in French & Jones (2004) and apply the test<br />

directly. The following results are obtained:<br />

)


110 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

• when we compare the Poisson-Gamma model with the Poisson-Inverse Gaussian model,<br />

the value <strong>of</strong> the test statistic is equal to −07988 leading to a p-value <strong>of</strong> 4244 %.<br />

• when we compare the Poisson-Gamma model with the Poisson-LogNormal model, the<br />

value <strong>of</strong> the test statistic is equal to −07688 leading to a p-value <strong>of</strong> 4420 %.<br />

• when we compare the Poisson-Inverse Gaussian model with the Poisson-LogNormal<br />

model, the value <strong>of</strong> the test statistic is equal to −05922 leading to a p-value <strong>of</strong><br />

5537 %.<br />

Therefore, the three models are not statistically different. Applying the same testing procedure<br />

to the policies that are in the portfolio for the first two years (this means 12 202 policies)<br />

yields the same conclusion.<br />

2.9.10 Information Criteria<br />

Non-nested models are <strong>of</strong>ten compared using likelihood-based criteria, including the<br />

well-known AIC, for instance. Since the competing models have the same number <strong>of</strong><br />

parameters, it is enough to examine the respective log-likelihoods. A commonly used<br />

rule-<strong>of</strong>-thumb consists in considering that two models are significantly different if the<br />

difference in the log-likelihoods exceeds five (corresponding to a difference in AICs <strong>of</strong><br />

more than ten, as discussed in Burnham & Anderson (2002)). This means here that<br />

the Poisson-Inverse Gaussian and Poisson-LogNormal models are significantly better than<br />

the Negative Binomial model. Considering the maximum <strong>of</strong> the log-likelihood, we chose<br />

the Poisson-LogNormal model for Portfolio B. The same conclusion is obtained using<br />

different criteria <strong>of</strong>ten used in practice. For instance, Raftery (1995) suggested that a<br />

model significantly outperfoms a competitor if the difference in their respective BIC values<br />

exceeds 5.<br />

2.9.11 Resulting <strong>Classification</strong> for Portfolio B<br />

Table 2.16 gives the resulting price list obtained with the Poisson-LogNormal model<br />

described in Table 2.15. The final a priori ratemaking contains 18 classes. This ratemaking<br />

will be used throughout the text for the examples involving Portfolio B.<br />

There is another way to present the results displayed in Table 2.16. The annual expected<br />

claim frequency is obtained from the reference class, with<br />

exp̂ 0 = 2950 %<br />

according to Table 2.15. Applying correction coefficients, we then obtain the annual expected<br />

claim frequency <strong>of</strong> any policyholder. Specifically, it is obtained from the multiplicative<br />

formula


<strong>Risk</strong> <strong>Classification</strong> 111<br />

Table 2.16<br />

A priori risk classification for Portfolio B (Poisson-LogNormal regression model).<br />

Age–Power Size <strong>of</strong> the city Gender Annual<br />

claim freq. (%)<br />

Group 1 Group 2 Group 3 Large Middle Small Female Male<br />

Weights<br />

(%)<br />

No No Yes No No Yes No Yes 2950 082<br />

No No Yes No No Yes Yes No 2759 043<br />

No No Yes No Yes No No Yes 3173 063<br />

No No Yes No Yes No Yes No 2968 037<br />

No No Yes Yes No No No Yes 3833 044<br />

No No Yes Yes No No Yes No 3586 031<br />

No Yes No No No Yes No Yes 1960 788<br />

No Yes No No No Yes Yes No 1833 464<br />

No Yes No No Yes No No Yes 2108 832<br />

No Yes No No Yes No Yes No 1972 466<br />

No Yes No Yes No No No Yes 2547 689<br />

No Yes No Yes No No Yes No 2382 399<br />

Yes No No No No Yes No Yes 1519 1259<br />

Yes No No No No Yes Yes No 1420 702<br />

Yes No No No Yes No No Yes 1634 1381<br />

Yes No No No Yes No Yes No 1528 724<br />

Yes No No Yes No No No Yes 1974 1268<br />

Yes No No Yes No No Yes No 1846 730<br />

2950 %<br />

{ exp−00669 = 094 if the policyholder is a female driver<br />

×<br />

1<br />

otherwise<br />

⎧<br />

exp−06640 = 051 if the policyholder belongs to Group 1 with respect to<br />

⎪⎨<br />

the combined variable Age ∗ Power<br />

× exp−04089 = 066 if the policyholder belongs to Group 2 with respect to<br />

the combined variable Age ⎪⎩<br />

∗ Power<br />

1<br />

otherwise<br />

⎧<br />

⎨ exp02620 = 130 if the policyholder resides in a large city<br />

× exp00731 = 108 if the policyholder resides in a middle-sized city<br />

⎩<br />

1<br />

otherwise<br />

2.10 Further Reading and Bibliographic Notes<br />

2.10.1 Generalized Linear Models<br />

After decades dominated by statistically unsophisticated models, it is now common practice<br />

since Mc Cullagh & Nelder (1989) and Brockman & Wright (1992) to achieve a priori<br />

classification with the help <strong>of</strong> generalized linear models (GLMs). They are so called because<br />

they generalize the classical linear models based on the Normal distribution.


112 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

<strong>Risk</strong> classification techniques for claim counts have been the topic <strong>of</strong> many papers<br />

appearing in the actuarial literature. Early references include ter Berg (1980b) and<br />

Albrecht (1983a,b,c). Dionne & Vanasse (1989, 1992) used a Negative Binomial<br />

regression model, while Dean, Lawless & Willmot (1989) used a Poisson-Inverse<br />

Gaussian distribution to fit the number <strong>of</strong> claims. Cummins, Dionne, McDonnald &<br />

Pritchett (1990) applied the GB2 family <strong>of</strong> distributions in modelling claim counts.<br />

Ter Berg (1996) considered the Generalized Poisson distribution <strong>of</strong> Consul (1990) and<br />

incorporated explanatory variables with the help <strong>of</strong> a loglinear model. There are now several<br />

textbooks devoted to the statistical analysis <strong>of</strong> count data. Let us mention Cameron &<br />

Trivedi (1998) and Winkelmann (2003). Before the Poisson regression became popular<br />

among actuaries, claims data were <strong>of</strong>ten analysed using logistic regression; see, e.g.,<br />

Beirlant, Derveaux, De Meyer, Goovaerts, Labies & Maenhoudt (1991).<br />

Separate analyses are usually conducted for claim frequencies and costs, including<br />

expenses, to arrive at a pure premium. With the noticeable exception <strong>of</strong> Jorgensen<br />

& Paes de Souza (1994), all the actuarial analyses <strong>of</strong> the pure premium so far have<br />

examined frequencies and severities separately. This approach is particularly relevant in<br />

motor insurance, where the risk factors influencing the two components <strong>of</strong> the pure premium<br />

are usually different.<br />

2.10.2 Nonlinear Effects<br />

GLMs however only deal with categorical risk factors in an efficient way. The main drawback<br />

<strong>of</strong> GLMs is that covariate effects are modelled in the form <strong>of</strong> a linear predictor. GLMs are<br />

too restrictive if nonlinear effects <strong>of</strong> continuous covariates are present. Continuous covariates<br />

can efficiently enter GLMs only if they are suitably transformed to reflect their true effect<br />

on the score scale. However, it is not always clear how the variables should be transformed.<br />

It has been common practice in insurance companies to model possibly nonlinear effects<br />

<strong>of</strong> a covariate by polynomials. However, it is well known to statisticians that polynomials<br />

are <strong>of</strong>ten not flexible enough to capture the variability <strong>of</strong> the data particularly when the<br />

polynomial degree is small (see, e.g., Fahrmeir & Tutz (2001), Chapter 5). For larger<br />

degrees the flexibility <strong>of</strong> polynomials increases but at the cost <strong>of</strong> possibly high variability<br />

<strong>of</strong> resulting estimates particularly at the left and right extreme values <strong>of</strong> the covariate.<br />

A more flexible approach for modelling nonlinear effects can be based on piecewise<br />

polynomials. More specifically, the unknown functions are approximated by polynomial<br />

splines, which may be regarded as piecewise polynomials with additional regularity<br />

conditions (see, e.g., de Boor, 1978). We refer the interested reader to Denuit & Lang<br />

(2004) for an overview <strong>of</strong> the existing approaches.<br />

2.10.3 Zero-Inflated Models<br />

Insurance data usually include a relatively large number <strong>of</strong> zeros (no claim). Deductibles<br />

and no claim discounts increase the proportion <strong>of</strong> zeros, since small claims are not reported<br />

by insured drivers. Zero-inflated models, including the Zero-Inflated Poisson (ZIP) model,<br />

account for this phenomenon.<br />

ZIP models can be considered as a mixture <strong>of</strong> a zero point mass and a Poisson distribution<br />

and were first used to study soldering defects on print wiring boards (Lambert, 1992). To


<strong>Risk</strong> <strong>Classification</strong> 113<br />

account for overdispersion in the Poisson part, generalizations <strong>of</strong> the model are possible and<br />

include the Zero-Inflated Negative Binomial (ZINB) distribution. See Yip & Yau (2005) for<br />

an application to insurance claim count data.<br />

Other than the zero-inflated models, parametric methods such as the mixture <strong>of</strong><br />

distributions can be used to model the claim frequency distribution with extra zeros.<br />

Hürlimann (1990) discussed the use <strong>of</strong> several pseudo compound Poisson distributions in<br />

modelling the claim count data. To test for a Poisson mixture, a test statistic was proposed<br />

by Carrière (1993a).<br />

2.10.4 Fixed Versus Random Effects<br />

The mixed Poisson distribution is <strong>of</strong>ten used to account for unknown characteristics <strong>of</strong> the<br />

driver, influencing the number <strong>of</strong> accidents reported to the company. When panel data are<br />

available, these hidden features can alternatively be captured by an individual heterogeneity<br />

term that is constant over time (the standard reference for panel data is Hsiao (2003); the<br />

particular case <strong>of</strong> count variables is treated in Cameron & Trivedi (1998)). Boucher &<br />

Denuit (2006) compared the two approaches with emphasis on the actual meaning <strong>of</strong> the<br />

estimated parameters in a mixed Poisson regression when random effects and covariates are<br />

correlated. In such a case, parameter estimates should be seen as the apparent effects <strong>of</strong> the<br />

covariates on the frequency. Keeping this in mind allows for a better understanding <strong>of</strong> the<br />

resulting price list.<br />

The results obtained by Boucher & Denuit (2006) legitimate the use <strong>of</strong> random effects<br />

models even if there exists a correlation between the regressors and the heterogeneity. The<br />

parameter estimates do not identify the impact <strong>of</strong> these regressors on the premium but only<br />

the apparent effects. Since this is usually the focus for the actuary in ratemaking, there<br />

is no problem with this interpretation. However, such a correlation clearly indicates that a<br />

correction should be done to obtain a more accurate model. In particular, the apparent high<br />

risk <strong>of</strong> young drivers should deserve some attention. The analysis conducted by Boucher<br />

& Denuit (2006) shows that the fixed effects are very heterogeneous for these individuals.<br />

Instead <strong>of</strong> penalizing these insureds in the a priori ratemaking, an appropriate bonus-malus<br />

scheme could be designed. Merit rating systems improve the fairness <strong>of</strong> the tariff in that<br />

respect. We will come back to this issue in Chapter 8.<br />

2.10.5 Hurdle Models<br />

Boucher, Denuit & Guillén (2006) presented and compared different risk classification<br />

models for the annual number <strong>of</strong> claims reported to the insurer. Generalized heterogeneous,<br />

zero-inflated, hurdle and compound frequency models are applied to a sample <strong>of</strong> an<br />

automobile portfolio <strong>of</strong> a major company operating in Spain.<br />

The hurdle models are widely used in connection with health care demands. An application<br />

to credit scoring is proposed in Dionne, Artis & Guillén (1996). With health care demand,<br />

it is generally accepted that the demand for certain types <strong>of</strong> health care services depends on<br />

two processes: the decisions <strong>of</strong> the individual and those <strong>of</strong> the health care provider. See, e.g.<br />

Pohlmeier & Ulrich (1995) or Santos Silva & Windmeijer (2001). The hurdle model<br />

also possesses a natural interpretation for the number <strong>of</strong> reported claims. A reason for the<br />

good fit <strong>of</strong> the zero-inflated models is certainly the reluctance <strong>of</strong> some insureds to report their


114 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

accident (since they would then be penalized by some bonus-malus scheme implemented by<br />

the insurer). It is reasonable to believe that the behaviour <strong>of</strong> the insureds is not the same<br />

when they have already reported a claim. This suggests that two processes govern the total<br />

number <strong>of</strong> claims, as in the hurdle model.<br />

2.10.6 Geographic Ratemaking<br />

It is common in motor insurance to let the risk premium per unit exposure vary with<br />

geographic area when all other risk factors are held constant. Most companies have adopted<br />

a risk classification according to the geographical zone where the policyholder lives (urban /<br />

nonurban for instance, or a more accurate split <strong>of</strong> the country according to Zip codes). The<br />

spatial variation may be related to geographic factors (e.g. traffic density or proximity to<br />

arterial roads) or to socio-demographic factors. In such cases it will be desirable to estimate<br />

the spatial variation in risk premium and to price accordingly. Spatial postcode methods<br />

for insurance rating attempt to extract information which is in addition to that contained<br />

in standard factors (like age or gender for instance). Often, claim characteristics tend to be<br />

similar in neighbouring postcode areas (after other factors have been accounted for). The idea<br />

<strong>of</strong> postcode rating models is to exploit this spatial smoothness by allowing for information<br />

transfer to and from neighbouring regions.<br />

Following Boskov & Verrall (1994) and Brouhns, Denuit, Masuy & Verrall<br />

(2002), the risk associated with each district can be assessed with the help <strong>of</strong> statistical<br />

models for spatial data. The techniques used for geographic ratemaking are closely related<br />

to those used in disease mapping by epidemiologists. Figure 2.17 is taken from Brouhns,<br />

Denuit, Masuy & Verrall (2002). It displays the raw exposures e i for each area in<br />

Belgium (the number <strong>of</strong> policy-years, in our case) whilst Figure 2.18 shows crude claim rates<br />

(observed number <strong>of</strong> claims divided by the corresponding expected number <strong>of</strong> claims given<br />

the characteristics <strong>of</strong> the policyholders living in each area). However, the latter map is at<br />

best difficult to interpret and can even be seriously misleading because the crude claim rates<br />

tend to be far more extreme in regions with smaller risk exposures (see Figures 2.17–2.18;<br />

the high rates in the north-west <strong>of</strong> Belgium (West Flanders) or in the south <strong>of</strong> the country<br />

correspond to districts for which we have less policyholders). Hence regions with the least<br />

reliable data will typically draw the main visual attention. This is one reason why it is<br />

difficult in practice to attempt any smoothing or risk assessment ‘by eye’.<br />

Whereas epidemiologists and environmetricians have been interested in spatial models for a<br />

long time, the actuarial literature is rather poor in respect <strong>of</strong> ratemaking methods incorporating<br />

geographic components. Taylor (1989) used two-dimensional splines on a plane linked to<br />

the map <strong>of</strong> the region by a transformation chosen to match the features <strong>of</strong> the specific region.<br />

He applied this method to a data set from Sydney, Australia. Boskov & Verrall (1994)<br />

highlighted some deficiencies in Taylor’s model, and provided an alternative treatment which<br />

made use <strong>of</strong> the Gibbs sampler to implement a Bayesian revision <strong>of</strong> the observation in each<br />

area. The main advantage <strong>of</strong> the Bayesian framework is that it recognizes the magnitudes <strong>of</strong><br />

sampling error and incorporates the concept <strong>of</strong> smoothing over neighbouring areas. Taylor<br />

(1996) adopted a similar point <strong>of</strong> view and applied Whittaker graduation (a widely accepted<br />

actuarial technique which has also been shown to have a Bayesian interpretation). Dixon,<br />

Kelsey & Verrall (2000) proposed an extension <strong>of</strong> the Boskov & Verrall (1994) model<br />

including weighting factors accounting for distances between regions.


<strong>Risk</strong> <strong>Classification</strong> 115<br />

Raw exposure<br />

(policyholders)<br />

48 84 157 318 621<br />

Figure 2.17<br />

Map <strong>of</strong> Belgium with exposures-to-risk.<br />

A key step in model specification is the definition <strong>of</strong> neighbours, i.e. those areas whose<br />

claim rates are correlated with that <strong>of</strong> a given area. A traditional definition <strong>of</strong> neighbours<br />

includes all areas contiguous to a given area. The methodology applied in Brouhns, Denuit,<br />

Masuy & Verrall (2002) is as follows. In a first stage, available explanatory variables<br />

are incorporated to the policyholders’ claim frequencies with the help <strong>of</strong> a (mixed) Poisson<br />

regression model. In a second stage, the data are aggregated by districts and overdispersion<br />

is accounted for by introduction <strong>of</strong> a random effect (split into a spatially structured part and<br />

a spatially unstructured one). The Boskov and Verrall model is then used to recover the<br />

spatial structure <strong>of</strong> the claims pattern that can be used to design the geographical ratemaking<br />

strategy <strong>of</strong> the company.<br />

In order to figure out the global claim pattern, Figure 2.19 displays the geographical<br />

risk variation, and serves as basis for the determination <strong>of</strong> the rating areas. Clearly, several<br />

regions with high, medium and low values <strong>of</strong> geographic risk emerge. This map can thus<br />

serve to design different rating areas, which in turn become a categorical variable that may<br />

enter the (mixed) Poisson regression model.


116 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Crude claim rates 0.0695 0.0893 0.113 0.1325 0.1554<br />

Figure 2.18<br />

Map <strong>of</strong> Belgium with crude rates.<br />

The approach however proceeds in two steps: first a regression is performed with all<br />

the covariates except the spatial ones to get the expected claim number <strong>of</strong> each area, and<br />

then the Boskov and Verrall model recovers the spatial claim pattern. In order to avoid the<br />

preprocessing <strong>of</strong> the data to remove the effect <strong>of</strong> all risk factors other than the spatial ones,<br />

Dimakos & Rattalma (2002) proposed a fully Bayesian approach to nonlife ratemaking.<br />

This approach still relies on GLMs and thus suffers from the drawbacks mentioned in<br />

Section 2.10.2: continuous covariates such as policyholders’ age enter linearly into the model<br />

(on the score scale) whereas it is now well established that the effect <strong>of</strong> some continuous<br />

variables is far from linear (typically, convex for policyholders’ age).<br />

Statistical modelling tools to perform space-time analysis <strong>of</strong> insurance data have been<br />

proposed by Denuit & Lang (2004) and Fahrmeir, Lang & Spies (2003). This approach<br />

enables the actuary to explore spatial and temporal effects simultaneously with the impact<br />

<strong>of</strong> other covariates. Bayesian generalized additive models provide a broad and flexible<br />

framework for regression analyses in realistically complex situations with cross-sectional,<br />

longitudinal and spatial data. All effects, as well as smoothing parameters, are regarded as<br />

random and are assigned appropriate priors.<br />

2.10.7 S<strong>of</strong>tware<br />

SAS R has been used throughout this chapter. The readers interested in the use <strong>of</strong> this s<strong>of</strong>tware<br />

to perform statistical analyses are referred to Der & Everitt (2002) for a very readable


<strong>Risk</strong> <strong>Classification</strong> 117<br />

Geographic risk –0.1463 –0.0975 –0.0311 –0.0415 –0.1141<br />

Figure 2.19<br />

Estimation <strong>of</strong> the geographically structured risk.<br />

introduction. For more information, we refer the interested reader to the SAS R website,<br />

http://www.sas.com/.<br />

A number <strong>of</strong> s<strong>of</strong>tware packages specific to the insurance industry have been developed,<br />

some <strong>of</strong> them based on SAS R . We mention a few <strong>of</strong> them hereafter, without being exhaustive.<br />

Note also that the authors did not test these tools, which are not available free <strong>of</strong> charge, so<br />

that we could not evaluate their relative performances.<br />

The Tricast suite has been developed using the support s<strong>of</strong>tware from SAS R and<br />

includes actuarial tools to create rating models. For more information, the interested<br />

reader may email tricast@tricast-group.com. The consulting firm Watson Wyatt <strong>of</strong>fers the<br />

Pretium R system, which integrates with SAS R . For more details, we refer the reader to<br />

http://www.watsonwyatt.com/. EMB has developed several computer tools for insurance<br />

rating. The Emblem R s<strong>of</strong>tware can be used to model claims experience and the Classifier TM<br />

s<strong>of</strong>tware allows one to assess and categorize geographically distributed risk. For more details,<br />

see http://www.emb.co.uk/.<br />

Academic researchers and research and development actuaries appreciate the s<strong>of</strong>tware<br />

system ‘R’. This is a free language and environment for statistical computing and graphics.<br />

R is a GNU project which is similar to the S language and environment which was developed<br />

at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and<br />

colleagues. R can be considered as a different implementation <strong>of</strong> S. There are some important<br />

differences, but much code written for S runs unaltered under R. R is available as Free<br />

S<strong>of</strong>tware under the terms <strong>of</strong> the Free S<strong>of</strong>tware Foundation’s GNU General Public License


118 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

in source code form. It compiles and runs on a wide variety <strong>of</strong> UNIX platforms and similar<br />

systems (including FreeBSD and Linux), Windows and MacOS. For more details, we refer<br />

the interested reader to http://www.r-project.org/. A good reference about the use <strong>of</strong> R for<br />

topics related to this chapter is certainly Faraway (2006). Readers completely new to R are<br />

referred to Dalgaard (2002) for an introduction.


Part II<br />

Basics <strong>of</strong> Experience<br />

Rating<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


3<br />

<strong>Credibility</strong> Models for <strong>Claim</strong><br />

<strong>Counts</strong><br />

3.1 Introduction<br />

3.1.1 From <strong>Risk</strong> <strong>Classification</strong> to Experience Rating<br />

We have seen in Chapter 2 how to partition a heterogeneous portfolio into more homogeneous<br />

classes with all policyholders belonging to the same class paying the same premium.<br />

However, tariff cells are still quite heterogeneous despite the use <strong>of</strong> many a priori variables.<br />

The expected claim frequency for the tariff cell is designed to reflect the average experience<br />

<strong>of</strong> the entire group. If the experience <strong>of</strong> a policy is consistently better (or worse) than the<br />

average experience <strong>of</strong> the group, the insurance company may consider adapting the amount<br />

<strong>of</strong> premium to be charged for this policy. Of course, this requires a model which can separate<br />

random variation from signal in the historical data to indicate whether this policy is <strong>of</strong> better<br />

(or worse) quality compared to the group average.<br />

It is reasonable to believe that the hidden features (unobserved risk characteristics that<br />

have been modelled by a random effect in the mixed Poisson regression model) are revealed<br />

by the number <strong>of</strong> claims reported by the policyholders over the successive insurance periods.<br />

Hence the adjustment <strong>of</strong> the premium based on the individual claims experience in order<br />

to restore fairness among policyholders. The allowance for the history <strong>of</strong> the policyholder<br />

in a rating model thus derives from interpretation <strong>of</strong> serial correlation for longitudinal data<br />

resulting from hidden features in the risk distribution.<br />

3.1.2 <strong>Credibility</strong> Theory<br />

<strong>Credibility</strong> theory is the art <strong>of</strong> combining different collections <strong>of</strong> data to obtain an accurate<br />

overall estimate. It provides actuaries with techniques to determine insurance premiums for<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


122 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

contracts that belong to a (more or less) heterogeneous portfolio, where there is limited or<br />

irregular claim experience for each contract but ample claim experience for the portfolio.<br />

<strong>Credibility</strong> theory can be seen as a set <strong>of</strong> quantitative tools that allows the insurers to perform<br />

experience rating, that is, to adjust future premiums based on past experience. In many<br />

cases, a compromise estimator is derived from a convex combination <strong>of</strong> a prior mean and<br />

the mean <strong>of</strong> the current observations. The weight given to the observed mean is called the<br />

credibility factor (since it fixes the extent to which the actuary may be confident in the<br />

data).<br />

3.1.3 Limited Fluctuation Theory<br />

There are different types <strong>of</strong> credibility mechanisms: limited fluctuations credibility and<br />

greatest accuracy credibility. Limited fluctuation credibility theory was developed in the early<br />

part <strong>of</strong> the 20th century in connection with workers compensation insurance by Mowbray<br />

(1914). It provides a mechanism for assigning full or partial credibility to a policyholder’s<br />

experience. In the former case, the policy is rated on the basis <strong>of</strong> its own claims history,<br />

whereas in the latter case, a weighted average <strong>of</strong> past experience and grand mean is used<br />

by the insurer. Although the limited fluctuation approach provides simple solutions to the<br />

problem, it suffers from a lack <strong>of</strong> theoretical justification. We will not consider this approach<br />

in this book. Instead, we will consider the greatest accuracy credibility theory formalized by<br />

Bühlmann (1967,1970).<br />

3.1.4 Greatest Accuracy <strong>Credibility</strong><br />

The idea behind greatest accuracy credibility theory can be summarized as follows: Tariff<br />

cells include policyholders with similar underwriting characteristics; each <strong>of</strong> them is viewed<br />

as homogeneous with respect to the underwriting characteristics used by the insurance<br />

company. Of course, the risks in the cell are not truly homogeneous: there still remains some<br />

heterogeneity in each <strong>of</strong> the tariff cells, as explained in the preceding chapters. To reflect this<br />

heterogeneity, the relative risk level <strong>of</strong> each policyholder in the rating cell is characterized by<br />

a risk parameter , but the value <strong>of</strong> varies by policyholder. If = 50 % then the expected<br />

number <strong>of</strong> claims reported by this policyholder is half <strong>of</strong> the claim frequency corresponding<br />

to the rating cell, whereas if = 300 % then the expected number <strong>of</strong> claims for this individual<br />

is three times the claim frequency <strong>of</strong> the rating cell. Of course, even if assuming the existence<br />

<strong>of</strong> such a is reasonable, it is not observable and the actuary can never know its true value<br />

for a given policyholder.<br />

Because varies by policyholder, there is a distribution function F giving the proportion<br />

<strong>of</strong> policyholders in the portfolio with relative risk level less than or equal to a certain<br />

threshold. Stated another way, F represents the probability that a policyholder picked at<br />

random from the portfolio has a risk parameter that is less than or equal to . The connection<br />

with the random effect introduced in the statistical models <strong>of</strong> Chapter 2 to account for the<br />

residual heterogeneity is now clear: this random effect becomes the random risk parameter<br />

for a policyholder picked at random from the portfolio (the distribution function <strong>of</strong> is<br />

F ). Even if the risk parameter remains unknown, the distribution function F can be<br />

estimated from data, as explained in Chapter 2. Once estimated, the heterogeneity model can<br />

be used to perform prediction on longitudinal data and allows for experience rating in motor


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 123<br />

insurance. In an empirical Bayesian setting, the prediction is derived from the expectation<br />

<strong>of</strong> a random effect with respect to a posterior distribution taking into account the history <strong>of</strong><br />

the individual.<br />

3.1.5 Linear <strong>Credibility</strong><br />

Bayesian statistics <strong>of</strong>fer an intellectually acceptable approach to greatest accuracy credibility<br />

theory. Nevertheless, practical applications involve numerical methods to perform integration<br />

with respect to a posteriori distribution, making more elementary approaches desirable (at<br />

least to get an easy-to-compute approximation <strong>of</strong> the result). In that respect, linear credibility<br />

formulas are especially useful. Basically, the actuary still resorts to a quadratic loss function<br />

but the shape <strong>of</strong> the credibility predictor is constrained ex ante to be linear.<br />

3.1.6 Financial Equilibrium<br />

When the insurer uses past claims history to reevaluate the amount <strong>of</strong> premium to be charged<br />

to the policyholders, the increases and decreases granted to the policyholders in the portfolio<br />

must exactly balance each other. The credibility mechanism indeed has little effect on the<br />

number <strong>of</strong> claims filed to the company. Therefore, the number <strong>of</strong> claims with and without<br />

credibility is very much the same.<br />

For this reason, we expect that the a posteriori corrections average to unity. This ensures<br />

that the average number <strong>of</strong> claims without credibility equals the average number <strong>of</strong> claims<br />

with credibility, and that the premium income will be enough to compensate the claims.<br />

3.1.7 Combining a Priori and a Posteriori Ratemaking<br />

The amount <strong>of</strong> premium paid by the policyholder depends on the rating factors <strong>of</strong> the current<br />

period (think for instance <strong>of</strong> the type <strong>of</strong> the car or <strong>of</strong> the occupation <strong>of</strong> the policyholder) but<br />

also on the claim history. The insurance premium is the product <strong>of</strong> a base premium and <strong>of</strong> a<br />

credibility coefficient. The base premium is a function <strong>of</strong> the current rating factors whereas<br />

the credibility coefficient usually depends on the history <strong>of</strong> claims at fault. Clearly, a priori<br />

and a posteriori ratings interact. To the extent that good drivers are rewarded in their base<br />

premiums (through other rating variables) the size <strong>of</strong> the bonus they require for equity is<br />

reduced.<br />

The claims history <strong>of</strong> each policyholder consists <strong>of</strong> a short integer-valued sequence <strong>of</strong><br />

yearly claim counts. The basic model used for experience rating is based on the Negative<br />

Binomial distribution. This probability law can be seen as a Poisson mixture distribution with<br />

Gamma mixing. Therefore, it allows for serial dependence <strong>of</strong> claim counts, by introducing<br />

Gamma-distributed unobserved individual heterogeneity. The serial dependence in claim<br />

counts sequences is generated by integrating the unobserved factor, and by updating its<br />

prediction when individual information increases. Alternative models with LogNormal or<br />

Inverse Gaussian unobserved heterogeneity have also been considered in the actuarial<br />

literature (and reviewed in Chapter 2).


124 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

3.1.8 Loss Function<br />

Whatever the model selected for the number <strong>of</strong> claims, the a posteriori premium correction<br />

is derived from the application <strong>of</strong> a loss function. The standard choice is a quadratic loss.<br />

In this case, the credibility premium is the function <strong>of</strong> past claim numbers that minimizes<br />

the expected squared difference with the next year claim number. It is well known that the<br />

solution is given by the a posteriori expectation.<br />

The penalties obtained in a credibility system calling upon a quadratic loss function<br />

are <strong>of</strong>ten so severe that it is almost impossible to implement them in practice, mainly for<br />

commercial reasons. In order to avoid this problem, some authors have proposed resorting to<br />

an exponential loss function: the hope is that breaking the symmetry between the overcharges<br />

and the undercharges leads to reasonable penalties. This reduces the maluses and the bonuses,<br />

and results in a financially balanced system.<br />

3.1.9 Agenda<br />

Section 3.2 introduces the basics <strong>of</strong> credibility models taking into account a priori<br />

characteristics. It starts with a simple introductory example that contains all the ideas <strong>of</strong><br />

credibility theory. Then, the probabilistic tools used in this context are briefly recalled.<br />

Section 3.3 is devoted to credibility formulas based on a quadratic loss function. The<br />

optimal predictor is then shown to be the conditional expectation <strong>of</strong> future claims given<br />

past claims history. In the particular case <strong>of</strong> Gamma distributed risk parameters, explicit<br />

expressions are available for a posteriori premium corrections. Discrete Poisson mixtures<br />

are considered in detail, providing approximate credibility formulas. Also, linear credibility<br />

predictions are derived.<br />

In Section 3.4, the quadratic loss function is replaced with an exponential one. Again, the<br />

general formulas simplify in the Negative Binomial case. Linear credibility predictions are<br />

considered as approximations to exact premium corrections.<br />

Section 3.5 discusses the type <strong>of</strong> dependence generated by the credibility construction. It<br />

is shown that the risk parameter, as well as future claim numbers, increases with the number<br />

<strong>of</strong> claims recorded in the past, and that future and past claim numbers are positively related.<br />

This confirms the intuition behind the actuarial credibility model.<br />

The final Section 3.6 gives the references, and discusses further issues.<br />

3.2 <strong>Credibility</strong> Models<br />

3.2.1 A Simple Introductory Example: the Good Driver / Bad Driver<br />

Model<br />

Consider an insurance portfolio where 60 % <strong>of</strong> the policyholders are good drivers. The<br />

probability that a good driver reports k claims during the year is given by the oi G <br />

distribution with G = 005. The remaining 40 % <strong>of</strong> the policyholders are bad drivers. The<br />

probability that they report k claims is given by the oi B distribution with B = 015.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 125<br />

A priori (i.e. at time t = 0 for a policyholder without any claim record), the actuary is not<br />

able to distinguish between good and bad drivers. The expected number <strong>of</strong> claims is then<br />

given by<br />

PrGood G + PrBad B = 009<br />

Considering a policyholder who reported k claims during the first year, it is nevertheless<br />

possible to compute the probability that he is a good driver: calling upon Bayes’ Theorem<br />

yields<br />

PrGoodk claims =<br />

=<br />

Prk claimsGood PrGood<br />

Prk claimsGood PrGood + Prk claimsBad PrBad<br />

exp− G k G PrGood<br />

exp− G k G PrGood + exp− B k B PrBad <br />

We get the values listed in Table 3.1 for increasing ks. Clearly, these probabilities are<br />

decreasing with the number <strong>of</strong> claims reported. If the policyholder does not report any claim<br />

during the first year, then the probability that he is a good driver is increased from 60 %<br />

a priori to 62.37 % a posteriori. As soon as one claim is reported during the first year,<br />

this probability decreases from 60 % a priori to 35.59 % a posteriori. The more claims are<br />

reported, the less likely that the policyholder is a good driver. The a posteriori probability <strong>of</strong><br />

being a good driver remains nevertheless positive whatever the number <strong>of</strong> claims reported<br />

to the insurance company.<br />

A posteriori (i.e. at time t = 1), the actuary knows the number k <strong>of</strong> claims reported by the<br />

policyholder during the year and should incorporate this additional information in the price<br />

list. Specifically, if k claims have been reported, the expected number <strong>of</strong> claims for year<br />

two is<br />

PrGoodk claims G + PrBadk claims B <br />

The claim record <strong>of</strong> the policyholder thus modifies the weights assigned to G and B . The<br />

values <strong>of</strong> the expected number <strong>of</strong> claims for year two according to the number k <strong>of</strong> claims<br />

Table 3.1 Expected numbers <strong>of</strong> claims for year two in the good driver/bad driver<br />

model given the number k <strong>of</strong> claims reported during the first year.<br />

# <strong>of</strong> claims k<br />

reported during<br />

year 1<br />

PrGoodk<br />

claims (%)<br />

PrBadk<br />

claims (%)<br />

Expected<br />

number <strong>of</strong><br />

claims for year 2<br />

0 6237 3763 00876<br />

1 3559 6441 01144<br />

2 1555 8445 01344<br />

3 578 9422 01442<br />

4 200 9800 01480<br />

5 068 9932 01493


126 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

reported during the first year are displayed in Table 3.1. This elementary example contains<br />

all the ingredients <strong>of</strong> experience rating.<br />

3.2.2 <strong>Credibility</strong> Models Incorporating a Priori <strong>Risk</strong> <strong>Classification</strong><br />

This chapter aims to design merit rating plans in accordance with the a priori ratemaking<br />

structure <strong>of</strong> the insurance company. Specifically, let us consider a portfolio with n policies,<br />

each one observed during T i periods. Let N it (with mean EN it = it ) be the number <strong>of</strong> claims<br />

reported by policyholder i during year t, i.e. during the period t − 1t, i = 1 2n,<br />

t = 1 2T i . By convention, time 0 corresponds to the issuance <strong>of</strong> the policy. We thus<br />

face a nested structure: each policyholder generates a sequence N i = N i1 N i2 N iTi T <strong>of</strong><br />

claim numbers. It is reasonable to assume independence between the series N 1 N 2 N n<br />

(at least in motor third party liability insurance), but we expect some positive dependence<br />

inside the N i s.<br />

The ith policy <strong>of</strong> the portfolio, i = 1 2n, is represented by a sequence<br />

i N i1 N i2 N i3 . At the portfolio level, the sequences i N i1 N i2 N i3 are<br />

assumed to be independent for i = 1 2n. The risk parameter i represents the<br />

risk proneness <strong>of</strong> policyholder i, i.e. unknown risk characteristics <strong>of</strong> the policyholder<br />

having a significant impact on the occurrence <strong>of</strong> claims; it is regarded as a random<br />

variable. Given i = , the random variables N i1 N i2 N i3 are assumed to be independent.<br />

Unconditionally, these random variables are dependent since their behaviour depends on the<br />

common i .<br />

The very basic tenets <strong>of</strong> a credibility model for claim counts are as follows:<br />

(i) a conditional distribution for the number <strong>of</strong> claims, that is, for the N it s given i = ;<br />

(ii) a distribution function F for the risk parameters 1 n to describe how the<br />

conditional distributions vary accross the portfolio;<br />

(iii) a loss function whose expectation has to be minimized in order to find the optimal<br />

experience premium.<br />

Let us briefly comment on these three aspects. In motor third party liability insurance<br />

portfolios, the Poisson distribution <strong>of</strong>ten provides a good description <strong>of</strong> the number <strong>of</strong> claims<br />

incurred by an individual policyholder during a given reference period (one year, say): the<br />

set <strong>of</strong> all Poisson assumptions should at least provide (locally in time) a good approximation<br />

to the accident generating mechanism. Given i = , the annual numbers <strong>of</strong> claims N it for<br />

policyholder i are assumed to be independent and to conform to a Poisson distribution with<br />

mean it . As in Chapter 2, it is a known function <strong>of</strong> the exposure-to-risk and possibly other<br />

covariates.<br />

Let us now consider the choice <strong>of</strong> F . Traditionally, actuaries have assumed that the<br />

distribution <strong>of</strong> values among all drivers is well approximated by a two-parameter Gamma<br />

distribution. The resulting probability distribution for the number <strong>of</strong> claims is Negative<br />

Binomial. Other classical choices for F include the Inverse-Gaussian and the LogNormal<br />

distributions, as explained in Chapters 1–2.<br />

Regarding (iii), quadratic and exponential loss functions will be considered in this chapter.<br />

This leads to the following model.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 127<br />

Definition 3.1 In the Poisson credibility model, the ith policy <strong>of</strong> the portfolio, i =<br />

1 2n, is represented by a sequence i N i where i is a positive random variable<br />

with unit mean representing the unexplained heterogeneity. Moreover,<br />

A1 given i = , the random variables N it , t = 1 2, are independent and conform<br />

to the oi it distribution;<br />

A2 at the portfolio level, the sequences i N i , i = 1 2n, are assumed to be<br />

independent.<br />

It is essential to understand the meaning <strong>of</strong> this classical actuarial construction. In<br />

Definition 3.1, dependence between annual claim numbers is a consequence <strong>of</strong> the<br />

heterogeneity <strong>of</strong> the portfolio (i.e. <strong>of</strong> i ); the dependence is only apparent. If we had a<br />

complete knowledge <strong>of</strong> policy characteristics then i would become deterministic and there<br />

would be no more dependence between the N it s for fixed i. The unexplained heterogeneity<br />

(which has been modelled through the introduction <strong>of</strong> the risk parameter i for policyholder<br />

i) is then revealed by the claims and premiums histories in a Bayesian way. These histories<br />

modify the distribution <strong>of</strong> i and hence modify the premium.<br />

Let<br />

T i<br />

T<br />

∑<br />

∑ i<br />

N i• = N it and i• = it (3.1)<br />

t=1<br />

be the total observed and expected claim numbers for policyholder i during the T i observation<br />

periods; the statistic N i• is a convenient summary <strong>of</strong> past claims history.<br />

Let us prove that in the credibility model <strong>of</strong> Definition 3.1, N i• is an exhaustive summary<br />

<strong>of</strong> the past claims history.<br />

Property 3.1 In the Poisson credibility model <strong>of</strong> Definition 3.1, the predictive distribution<br />

<strong>of</strong> i only depends on N i• , i.e. the equality<br />

holds true whatever t ≥ 0.<br />

t=1<br />

Pr i ≤ tN i1 N i2 N iTi = Pr i ≤ tN i• <br />

Pro<strong>of</strong> Let f ·k i1 k iTi be the conditional probability density function <strong>of</strong> i given<br />

that N i1 = k i1 N iTi = k iTi , and let<br />

T<br />

∑ i<br />

k i• =<br />

k it<br />

t=1<br />

be the total number <strong>of</strong> accidents reported by policyholder i to the company. We can then<br />

write<br />

f k i1 k iTi = PrN i1 = k i1 N iTi = k iTi i = f <br />

PrN i1 = k i1 N iTi = k iTi <br />

exp−<br />

=<br />

i• k i• f <br />

∫ +<br />

exp−<br />

0 i• k i• f d<br />

which depends only on k i• . This ends the pro<strong>of</strong>.


128 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

This result has an important practical consequence: the Poisson credibility model A1–A2<br />

disregards the age <strong>of</strong> the claims. The penalty induced by an old claim is strictly identical<br />

to the one induced by a recent claim. This may <strong>of</strong> course sometimes be undesirable for<br />

commercial purposes. We will come back to this issue in the last section <strong>of</strong> this chapter.<br />

3.3 <strong>Credibility</strong> Formulas with a Quadratic Loss Function<br />

3.3.1 Optimal Least-Squares Predictor<br />

Often in applied probability, one seeks to approximate an unknown quantity by a function<br />

<strong>of</strong> a set <strong>of</strong> related variables, by minimizing the expected squared difference between the two<br />

items. This leads to the least-squares principle, and to the conditional expectation, as shown<br />

in the next result.<br />

Proposition 3.1 Let us consider a sequence <strong>of</strong> random variables X 1 X 2 X 3 and a<br />

risk parameter . Given , the X t s are independent. The first two moments <strong>of</strong> the X t s are<br />

assumed to be finite. Moreover, the conditional mean <strong>of</strong> the X t s is given by<br />

t = EX t t = 1 2 3<br />

and E t = t .<br />

The minimum <strong>of</strong><br />

[(<br />

) 2 ]<br />

E T+1 − X 1 X 2 X T <br />

over all the measurable functions T → is obtained for<br />

⋆ X 1 X 2 X T = EX T+1 X 1 X 2 X T <br />

= E T+1 X 1 X 2 X T <br />

Pro<strong>of</strong><br />

An easy way to get the announced result consists in noting that<br />

[(<br />

) 2 ]<br />

E T+1 − X 1 X 2 X T <br />

[(<br />

= E T+1 − ⋆ X 1 X 2 X T <br />

) 2 ]<br />

+ ⋆ X 1 X 2 X T − X 1 X 2 X T <br />

[(<br />

) 2 ]<br />

= E T+1 − ⋆ X 1 X 2 X T <br />

[(<br />

) 2 ]<br />

+ E ⋆ X 1 X 2 X T − X 1 X 2 X T <br />

which is clearly minimal for ≡ ⋆ .


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 129<br />

Proposition 3.1 indicates that the best approximation (with respect to the mean squared<br />

error) to T+1 given X 1 X 2 X T is EX T+1 X 1 X 2 X T , that is, the posterior<br />

expectation <strong>of</strong> X T+1 given X 1 X 2 X T (also called the predictive mean). The posterior<br />

distribution <strong>of</strong> X T+1 is then obtained by conditioning on past claims history.<br />

To calculate the predictive mean one needs a conditional distribution <strong>of</strong> losses given the<br />

parameter <strong>of</strong> interest (<strong>of</strong>ten the conditional mean) and a prior distribution <strong>of</strong> the parameter<br />

<strong>of</strong> interest.<br />

3.3.2 Predictive Distribution<br />

Let us now come back to the credibility model <strong>of</strong> Definition 3.1. The conditional distribution<br />

<strong>of</strong> N iTi +1 given N i1 = k 1 N iTi = k Ti<br />

is called the predictive distribution. It tells the<br />

actuary what the next year number <strong>of</strong> claims might be given the information contained in<br />

past claims history. It is the relevant distribution for risk analysis, management and decision<br />

making.<br />

In our case, we have<br />

PrN iTi +1 = kN i1 = k 1 N iTi = k Ti<br />

<br />

∫ (<br />

∏Ti<br />

)<br />

0 t=1 PrN it = k t i = PrN iTi +1 = k i = dF <br />

=<br />

(<br />

∏Ti<br />

)<br />

t=1 PrN it = k t i = dF <br />

=<br />

∫ <br />

∫ <br />

0<br />

exp<br />

(<br />

− <br />

0 i• ) (<br />

k • exp − iTi +1 ) iT i +1 k<br />

dF<br />

k! <br />

∫ <br />

exp ( − <br />

0 i• ) <br />

k • dF <br />

Now, the posterior distribution <strong>of</strong> i given past claims history N i1 = k 1 N iTi = k Ti<br />

given by<br />

(<br />

∏ Ti<br />

t=1 exp ( − it ) )<br />

it k it<br />

dF<br />

k it ! <br />

(<br />

∫ ∏ Ti<br />

0 t=1 exp ( − it ) )<br />

it k it<br />

dF<br />

k it ! <br />

= exp ( − i• ) k • dF <br />

∫ <br />

exp ( − <br />

0 i• ) k • dF <br />

is<br />

Hence,<br />

PrN iTi +1 = kN i1 = k 1 N iTi = k Ti<br />

<br />

∫ <br />

= exp ( − iTi +1 ) iT i +1 k<br />

dF<br />

0<br />

k! k • <br />

where F ·k • is the conditional distribution function <strong>of</strong> i given N i• = k. The predictive<br />

distribution thus appears as a Poisson mixture distribution, where the mixing is with respect


130 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

to the posterior distribution <strong>of</strong> i . Past claims history N i1 = k 1 N iTi = k Ti<br />

modifies the<br />

distribution <strong>of</strong> i , and this modified distribution is used as a new mixing law for the number<br />

<strong>of</strong> claims N iTi +1 for year T i + 1.<br />

3.3.3 Bayesian <strong>Credibility</strong> Premium<br />

We are looking for the function ⋆ <strong>of</strong> N i1 N iTi that is the closest to i , i.e. minimizing<br />

[ (i<br />

E − N i1 N iTi ) ] 2<br />

over all the measurable functions T i<br />

→ . Proposition 3.1 gives the solution <strong>of</strong> this<br />

optimization problem:<br />

⋆ N i1 N iTi = E [ i<br />

∣ ∣ N i1 N iTi<br />

]<br />

<br />

In general, the posterior mean <strong>of</strong> i given N i1 = k 1 N iTi = k Ti<br />

is given by<br />

E [ i<br />

∣ ∣ N i1 = k 1 N iTi = k Ti<br />

]<br />

=<br />

∫ +<br />

=<br />

(<br />

∏Ti<br />

<br />

0<br />

∫ +<br />

0<br />

)<br />

t=1 PrN it = k t i = dF <br />

(<br />

∏Ti<br />

)<br />

t=1 PrN it = k t i = dF <br />

∫ +<br />

0<br />

exp− i• k •+1 dF <br />

∫ +<br />

0<br />

exp− i• k • dF (3.2)<br />

This a posteriori expectation thus appears as the ratio <strong>of</strong> two Mellin transforms Mk =<br />

Eexp− i i k <strong>of</strong> i. It is interesting to note that the a posteriori expectation depends<br />

only on the total number k • <strong>of</strong> accidents caused in the past T i years <strong>of</strong> insurance, and not on<br />

the history <strong>of</strong> these claims. This was expected from Property 3.1. This is a characteristic <strong>of</strong><br />

the credibility models with static random effects.<br />

The Bayesian credibility premium is simply the mean <strong>of</strong> the predictive distribution. It is<br />

given by<br />

E [ ∣<br />

] ∫ <br />

N iTi +1<br />

∣N i1 = k 1 N iTi = k Ti = iTi +1dF k • <br />

0<br />

= iTi +1E [ ∣ ]<br />

i N i1 = k 1 N iTi = k Ti<br />

where the posterior expectation <strong>of</strong> i is given by (3.2). The posterior expected claim number<br />

EN iTi +1N i1 N iTi is obtained by multiplying iTi +1 by the correction coefficient<br />

E i N i1 N iTi . This approach always yields financial balance, since<br />

[<br />

E E [ ∣ ] ]<br />

i N i1 = k 1 N iTi = k Ti<br />

= E i = 1<br />

so that the corrections average to unity.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 131<br />

3.3.4 Poisson-Gamma <strong>Credibility</strong> Model<br />

Gamma Distribution for the Random Effects<br />

Let us assume that i ∼ ama a with probability density function (1.35). The joint<br />

probability mass function <strong>of</strong> the random vector N i = N i1 N i2 N iTi is given by (2.19).<br />

The joint probability density <strong>of</strong> the random vector i N i1 N i2 N iTi is given by<br />

T<br />

∏ i<br />

t=1<br />

( ) ( ) kit<br />

i it<br />

exp − i it<br />

k it !<br />

(<br />

T<br />

∑ i<br />

∝ exp − i<br />

t=1<br />

1<br />

a aa a−1<br />

i<br />

exp−a i <br />

)<br />

∑ Ti<br />

t=1<br />

it <br />

k it+a−1<br />

i exp−a i (3.3)<br />

A Posteriori Distribution <strong>of</strong> Random Effects<br />

In Section 2.5.1, we established that in a two-period model, the a posteriori distribution <strong>of</strong><br />

i remained Gamma, with updated parameters. Let us now extend the result to a multiperiod<br />

setting.<br />

Now, the conditional distribution <strong>of</strong> i given the past claims frequencies N it = k it , t =<br />

1 2T i , is obtained from (2.19) and (3.3). This gives<br />

(<br />

exp<br />

(− i<br />

∫ +<br />

0<br />

exp<br />

(<br />

−<br />

a + ∑ T i<br />

t=1 it<br />

(<br />

a + ∑ T i<br />

t=1 it<br />

( ))<br />

T<br />

∑ i<br />

= exp<br />

(− i a + it<br />

t=1<br />

))<br />

a+∑T i<br />

t=1 k it−1<br />

i<br />

))<br />

a+∑T i<br />

t=1 k it−1<br />

d<br />

a+∑T i<br />

t=1 k it−1<br />

i<br />

(<br />

a + ∑ T i<br />

<br />

t=1 it<br />

) a+<br />

∑ Ti<br />

t=1 k it<br />

(<br />

a + ∑ ) <br />

T i<br />

t=1 k it<br />

Coming back to (1.34), we recognize a Gamma probability density function. Specifically,<br />

we thus have that<br />

The correction coefficient is given by<br />

i N i1 N i2 N iTi ∼ am ( a + N i• a+ i•<br />

)<br />

(3.4)<br />

E i N i1 N i2 N iTi = a + N i•<br />

a + i•<br />

which clearly increases in the past claims N i• . The variance <strong>of</strong> i given past claim history<br />

is given by<br />

V i N i1 N i2 N iTi =<br />

a + N i•<br />

a + i• 2 <br />

The expected number <strong>of</strong> claims in year T i + 1 given past claims history is given by<br />

a + N<br />

EN iTi +1N i1 N i2 N iTi = iTi +1E i N i1 N i2 N iTi = i•<br />

iTi +1 <br />

a + i•


132 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Let us briefly comment on this a posteriori correction:<br />

• We see that the a posteriori corrections will be more severe as the residual heterogeneity,<br />

measured by V i = 1/a, increases.<br />

• Considering two policyholders (numbered i 1 and i 2 ) in the portfolio during T i1<br />

= T i2<br />

periods<br />

such that i 1 is a priori a better driver than i 2 , that is, i1 • < i2 •: if these policyholders<br />

do not report any claim (i.e., N i1 • = N i2 • = 0) then the corrections to be applied to these<br />

policyholders satisfy<br />

a a<br />

><br />

a + i1 • a + i2 •<br />

so that the a priori worse driver receives more discount.<br />

• If these policyholders report k ≥ 1 claims (i.e., N i1 • = N i2 • = k) then the penalties are such<br />

that<br />

a + k<br />

a + i1 •<br />

> a + k <br />

a + i2 •<br />

Hence, the penalty for the a priori bad driver is less severe than for the good one.<br />

3.3.5 Predictive Distribution and Bayesian <strong>Credibility</strong> Premium<br />

From (3.4) we know that the posterior distribution <strong>of</strong> i given past claims history is still<br />

Gamma, with parameters a+N i• and a+ i• . Therefore, the predictive distribution <strong>of</strong> N iTi +1<br />

is Negative Binomial, that is,<br />

( ) ( ) k (<br />

) a+k•<br />

a + k• + k − 1 iTi +1<br />

a + <br />

PrN iTi +1 = kN i• = k • =<br />

i•<br />

<br />

k a + i• + iTi +1 a + i• + iTi +1<br />

Furthermore, the Bayesian credibility premium is given by<br />

a + k<br />

EN iTi +1N i• = k • = iTi +1E i N i• = k • = •<br />

iTi +1 <br />

a + i•<br />

Remark 3.1 As time goes on, the Bayesian credibility premium tends to 0 for a policyholder<br />

reporting no claim. This can be seen as an unrealistic feature. An easy way to avoid this<br />

problem is to decompose N i into two parts: a component N 1<br />

i distributed according to a pure<br />

Poisson distribution that represents the claims occurring purely at random, and a component<br />

N 2<br />

i that is Negative Binomial and influenced by the driver’s abilities. This introduces a<br />

lower bound on the a posteriori claim frequency, which is no more allowed to vanish in the<br />

long term.<br />

It is worth mentioning that the Bayesian credibility premium can be cast into<br />

( a<br />

EN iTi +1N i• = k • = E<br />

a + i +<br />

)<br />

i• k •<br />

EN<br />

i• a + i• iTi +1<br />

i•


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 133<br />

Recall that E i = 1 and EN iTi +1 = iTi +1. Hence, the Bayesian credibility premium is<br />

obtained by multiplying the a priori expected number <strong>of</strong> claims EN iTi +1 by an appropriate<br />

correction factor. This factor appears as a weighted average <strong>of</strong> the prior expectation <strong>of</strong><br />

i , receiving weight a/a + i• , and the average claim frequency for that policy k • / i•<br />

receiving a weight i• /a + i• .<br />

3.3.6 Numerical Illustration<br />

Let us now compute the coefficients to apply to the pure premium according to the number<br />

<strong>of</strong> claims reported in the past. To this end, let us consider the Negative Binomial fit<br />

<strong>of</strong> Portfolio A given in Table 2.7. Table 3.2 displays the values <strong>of</strong> E i N i• = k • for<br />

different combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for a good<br />

driver (with observable characteristics: man, age 35, rural area, upfront premium, private<br />

use, i = 00928). Tables 3.3–3.4 are the analogues for an average driver (with observable<br />

characteristics: woman, age 25, urban area, upfront premium, private use, i = 01408) and<br />

a bad driver (with observable characteristics: man, age 22, rural area, split premium, private<br />

use, i = 02840), respectively.<br />

If the good driver does not report any accident during the first year, we see from Table 3.2<br />

that he will pay 92 % <strong>of</strong> the base premium to be covered during the second year. If, in<br />

addition, he does not file any claim during the second year, the premium decreases to 85.2 %<br />

<strong>of</strong> the base premium. After ten claim-free years, he will have to pay 53.4 % <strong>of</strong> the base<br />

premium to be covered by the insurer.<br />

Considering the average driver, we see from Table 3.3 that he will be awarded more<br />

discount than the good driver if he does not file any claim. Indeed he will have to pay<br />

88.3 % (instead <strong>of</strong> 92 %) <strong>of</strong> the base premium after one claim-free year, 79.1 % (instead <strong>of</strong><br />

85.1 %) <strong>of</strong> the base premium after two claim-free years, and 43.1 % (instead <strong>of</strong> 53.4 %) after<br />

ten claim-free years.<br />

Table 3.2 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i<br />

and number <strong>of</strong> past claims k • for a good driver from Portfolio A (average annual claim<br />

frequency <strong>of</strong> 9.28 %).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 92.0 % 178.4 % 264.7 % 351.1 % 437.5 % 523.8 %<br />

2 85.2 % 165.1 % 245.1 % 325.0 % 405.0 % 485.0 %<br />

3 79.3 % 153.7 % 228.2 % 302.6 % 377.0 % 451.5 %<br />

4 74.2 % 143.8 % 213.4 % 283.0 % 352.7 % 422.3 %<br />

5 69.7 % 135.1 % 200.5 % 265.9 % 331.3 % 396.7 %<br />

6 65.7 % 127.3 % 189.0 % 250.6 % 312.3 % 374.0 %<br />

7 62.1 % 120.4 % 178.8 % 237.1 % 295.4 % 353.7 %<br />

8 58.9 % 114.3 % 169.6 % 224.9 % 280.2 % 335.6 %<br />

9 56.0 % 108.7 % 161.3 % 213.9 % 266.6 % 319.2 %<br />

10 53.4 % 103.6 % 153.8 % 204.0 % 254.1 % 304.3 %


134 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 3.3 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i<br />

and number <strong>of</strong> past claims k • for an average driver from Portfolio A (average annual<br />

claim frequency <strong>of</strong> 14.09 %).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 88.3 % 171.2 % 254.2 % 337.1 % 420.0 % 503.0 %<br />

2 79.1 % 153.3 % 227.6 % 301.8 % 376.1 % 450.4 %<br />

3 71.6 % 138.8 % 206.0 % 273.3 % 340.5 % 407.7 %<br />

4 65.4 % 126.8 % 188.2 % 249.6 % 311.0 % 372.5 %<br />

5 60.2 % 116.7 % 173.2 % 229.8 % 286.3 % 342.8 %<br />

6 55.8 % 108.1 % 160.5 % 212.8 % 265.2 % 317.5 %<br />

7 51.9 % 100.7 % 149.4 % 198.2 % 247.0 % 295.7 %<br />

8 48.6 % 94.2 % 139.8 % 185.5 % 231.1 % 276.7 %<br />

9 45.7 % 88.5 % 131.4 % 174.3 % 217.1 % 260.0 %<br />

10 43.1 % 83.5 % 123.9 % 164.3 % 204.8 % 245.2 %<br />

Table 3.4 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i<br />

and number <strong>of</strong> past claims k • for a bad driver from Portfolio A (average annual claim<br />

frequency <strong>of</strong> 28.40 %).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 78.9 % 1531 % 227.2 % 301.3 % 375.5 % 449.6 %<br />

2 65.2 % 1264 % 187.7 % 248.9 % 310.2 % 371.4 %<br />

3 55.5 % 1077 % 159.9 % 212.0 % 264.2 % 316.4 %<br />

4 48.4 % 938 % 139.2 % 184.7 % 230.1 % 275.5 %<br />

5 42.9 % 831 % 123.3 % 163.6 % 203.8 % 244.0 %<br />

6 38.5 % 746 % 110.7 % 146.8 % 182.9 % 219.0 %<br />

7 34.9 % 676 % 100.4 % 133.1 % 165.9 % 198.6 %<br />

8 31.9 % 619 % 91.8 % 121.8 % 151.8 % 181.7 %<br />

9 29.4 % 570 % 84.6 % 112.3 % 139.9 % 167.5 %<br />

10 27.3 % 529 % 78.5 % 104.1 % 129.7 % 155.3 %<br />

Table 3.4 shows higher discounts for the bad driver, with percentages <strong>of</strong> 78.9 % after one<br />

claim-free year, 65.2 % after two claim-free years and 27.3 % after ten claim-free years.<br />

The discounts awarded to policyholders who do not report any accident to the insurance<br />

company are thus increasing with the a priori annual expected claim frequency. The more<br />

claims are expected by the insurance company on the basis <strong>of</strong> observable characteristics, the<br />

higher the discount in case no claims are reported. Note that the a posteriori expected claim<br />

frequencies remain large for a priori bad drivers since reporting no claims happens with a<br />

smaller probability.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 135<br />

Now, considering the penalty in case one claim is reported, we see that the good driver<br />

who reports one claim during the first year will have to pay 178.4 % <strong>of</strong> the base premium<br />

to be covered by the insurance company during the second year. The average driver in the<br />

same situation pays 171.2 % <strong>of</strong> the base premium, and the bad driver 153.1 % <strong>of</strong> the base<br />

premium. The penalties in the case where an accident is reported to the company are thus<br />

decreasing with the a priori annual expected claim frequencies.<br />

The system appears rather severe. Reporting a claim entails a penalty <strong>of</strong> between 50 and<br />

75 %, which seems difficult to implement in practice. This is a consequence <strong>of</strong> the financial<br />

balance property. The weighted averages <strong>of</strong> all figures (in each infinite row) is equal to<br />

100 %. The modest discounts awarded to the majority <strong>of</strong> claim-free policyholders have<br />

then to be exactly compensated by the penalties supported by the minority <strong>of</strong> policyholders<br />

reporting claims to the company. This causes large penalties. The severity <strong>of</strong> the credibility<br />

corrections also appears in the number <strong>of</strong> claim free years needed to erase the penalty induced<br />

by an accident: 10 years for the good driver, 7 for the average one and 3 for the bad one.<br />

These periods make sense compared to the average claim number per policy.<br />

3.3.7 Discrete Poisson Mixture <strong>Credibility</strong> Model<br />

The good driver / bad driver model presented in Section 3.2.1 assumes that the portfolio is<br />

composed <strong>of</strong> two classes <strong>of</strong> insured drivers. This can easily be extended to several categories<br />

<strong>of</strong> drivers (for instance, bad, below average, average, above average, excellent). Specifically,<br />

let us assume that each risk class <strong>of</strong> the portfolio is made <strong>of</strong> q categories <strong>of</strong> insured drivers.<br />

Let us denote as p 1 p 2 p q the proportion <strong>of</strong> drivers in each <strong>of</strong> these categories. Then,<br />

⎧<br />

1 with probability p 1<br />

⎪⎨ 2 with probability p 2<br />

i =<br />

(3.5)<br />

⎪⎩<br />

q with probability p q<br />

with 0 < 1 < 2 < ···< q . The number N it <strong>of</strong> claims caused by policyholder i during year<br />

t is distributed as<br />

PrN it = k =<br />

q∑<br />

exp− it j it j k<br />

p<br />

k! j k= 0 1<br />

j=1<br />

Note that the special case q = 2 gives the good driver / bad driver model.<br />

The expected number <strong>of</strong> claims reported by policyholder i in year T i + 1 is given by<br />

q∑<br />

q∑<br />

EN iTi +1 = EN iTi +1 = j p j = iTi +1 j p j <br />

j=1<br />

If we know that this policyholder reported k claims during the past T i years, we expect<br />

EN iTi +1N i• = k claims in year T i + 1. The computation <strong>of</strong> EN iTi +1N i• = k requires the<br />

conditional distribution <strong>of</strong> N iTi +1 given N i• = k. We get it from<br />

j=1


136 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

PrN iTi +1 = iN i• = k = PrN iT i +1 = i and N i• = k<br />

PrN i• = k<br />

∑ q<br />

j=1<br />

=<br />

PrN iT i +1 = i i = j PrN i• = k i = j p j<br />

∑ q<br />

j=1 PrN i• = k i = j p j<br />

=<br />

q∑<br />

exp− iTi +1 j iT i +1 j i p j exp− i• j j<br />

k<br />

∑<br />

i! q<br />

l=1 p <br />

l exp− i• l l<br />

k<br />

j=1<br />

Hence, given N i• = k, the law <strong>of</strong> N iTi +1 appears as a discrete Poisson mixture with modified<br />

weights<br />

˜p j k =<br />

p j exp− i• j j<br />

k<br />

∑ q<br />

l=1 p <br />

l exp− i• l l<br />

k<br />

The a posteriori expectation <strong>of</strong> N iTi +1 is then<br />

EN iTi +1N i• = k = iTi +1<br />

The posterior mean <strong>of</strong> i given N i• = k is then given by<br />

∑ q∑<br />

q<br />

j=1<br />

j˜p j k = p j exp− i• j k+1<br />

j<br />

iTi +1 ∑ q<br />

l=1 p (3.6)<br />

l exp− i• l l<br />

k<br />

j=1<br />

E i N • = k =<br />

∑ q<br />

j=1 p j exp− i• j k+1<br />

j<br />

∑ q<br />

l=1 p (3.7)<br />

l exp− i• l l<br />

k<br />

Compared to (3.2), the integrals now reduce to sums over the q components <strong>of</strong> the discrete<br />

mixture.<br />

Even if the reality <strong>of</strong> the insurance portfolio is a discrete mixture (with a i specific to<br />

policyholder i, resulting in q = n), this model is not convenient since it involves a large<br />

number <strong>of</strong> parameters. The discrete Poisson mixture nevertheless deserves interest as an<br />

approximation <strong>of</strong> more general Poisson mixtures, as discussed below.<br />

3.3.8 Discrete Approximations for the Heterogeneous Component<br />

Moment Spaces<br />

Apart from the Poisson-Gamma case, the computation <strong>of</strong> the conditional expectation giving<br />

the credibility premium requires numerical integration. A convenient alternative is to<br />

approximate the mixing distribution by a suitable discrete analogue sharing the same sequence<br />

<strong>of</strong> moments. We are then back to the discrete Poisson mixture credibility model studied<br />

above, and we benefit from the easy-to-compute formulas (3.6)–(3.7) valid in this case.<br />

The discrete approximations to i are based on the knowledge <strong>of</strong> its support, 0b say,<br />

with b possibly infinite, and its first few moments 1 2 In general, let us denote by<br />

s 0 b the class <strong>of</strong> all the random variables X with support in 0band with prescribed<br />

first s −1 moments EX k = k , k = 1 2s−1. In the literature, s 0 b is referred<br />

to as a moment space. Properly speaking, it is a class <strong>of</strong> distribution functions rather than a<br />

class <strong>of</strong> random variables. Classical problems related to moment spaces are for instance the


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 137<br />

determination <strong>of</strong> conditions on 0 b 1 2 s−1 so that s 0 b is not void, or<br />

the obtention <strong>of</strong> the elements in s 0 b with the minimal number <strong>of</strong> support points.<br />

These elements possess some extremal properties and will be used here in connection with<br />

credibility formulas. Henceforth, we tacitly assume that 0b and 1 2 s−1 are such<br />

that the associated moment space s 0 b is not void and is not a singleton (i.e.,<br />

s 0 b consists <strong>of</strong> at least two distinct distribution functions).<br />

De Vylder’s Moment Problem<br />

De Vylder (1996, Section 8.3) investigated the following problem: within s 0 b ,<br />

determine the random variables X s<br />

min<br />

and Xs max such that the inequalities<br />

EX s<br />

min s ≤ EX s ≤ EX s<br />

max s hold for all X ∈ s 0 b (3.8)<br />

Explicit solutions to (3.8) are available for s up to five. The supports <strong>of</strong> the extremal<br />

distributions are given in Tables 3.5–3.6.<br />

As shown in Denuit, De Vylder & Lefèvre (1999), the random variables X s<br />

X s<br />

min and<br />

max involved in (3.8) give bounds on EX for every function that is s − 2 times<br />

differentiable, with a convex s − 2th derivative (such functions are called s-convex; see<br />

Roberts & Varberg (1973) for details). In that context, X s<br />

min<br />

is called the s-convex<br />

minimum, and Xmax s is called the s-convex maximum.<br />

Table 3.5<br />

s = 1to5.<br />

Probability distribution <strong>of</strong> X s<br />

min ∈ s0 b achieving the lower bound in (3.8) for<br />

s Support points Probability masses<br />

1 0 1<br />

2 1 1<br />

3 0<br />

2 − 2 1<br />

2<br />

2<br />

1<br />

2 1<br />

2<br />

4 r + = 3 − 1 2 + √ 3 − 1 2 2 − 4 2 − 2 1 1 3 − 2 2 <br />

2 2 − 2 1 <br />

r − = 3 − 1 2 − √ 3 − 1 2 2 − 4 2 − 2 1 1 3 − 2 2 <br />

2 2 − 2 1 <br />

1 − r −<br />

r + − r −<br />

1 − 1 − r −<br />

r + − r −<br />

5 0 1− q + − q −<br />

t + = 1 4 − 2 3 + √ 1 4 − 2 3 2 − 4 1 3 − 2 2 2 4 − 2 3 <br />

2 1 3 − 2 2 q + = 2 − t − 1<br />

t + t + − t − <br />

t − = 1 4 − 2 3 − √ 1 4 − 2 3 2 − 4 1 3 − 2 2 2 4 − 2 3 <br />

2 1 3 − 2 2 q − = 2 − t + 1<br />

t − t − − t +


138 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 3.6<br />

s = 1to5.<br />

Probability distribution <strong>of</strong> X s<br />

max ∈ s 0 b achieving the upper bound in (3.8) for<br />

s Support points Probability masses<br />

1 b 1<br />

b − <br />

2 0<br />

1<br />

<br />

b<br />

b<br />

1<br />

b<br />

b<br />

3 1 − 2<br />

b − 1 2<br />

b − 1 b − 1 2 + 2 − 2 1<br />

b<br />

2 − 2 1<br />

b − 1 2 + 2 − 2 1<br />

4 0 1− p 1 − p 2<br />

3 − b 2<br />

<br />

p<br />

2 − b 1 =<br />

2 − b 1 3<br />

1 3 − b 2 3 − 2b 2 + b 2 1 <br />

b p 2 =<br />

5 z + = 1 − b 4 − b 3 − 2 − b 1 3 − b 2 + √ <br />

2 1 − b 3 − b 2 − 2 − b 1 2 <br />

z − = 1 − b 4 − b 3 − 2 − b 1 3 − b 2 − √ <br />

2 1 − b 3 − b 2 − 2 − b 1 2 <br />

b<br />

1 3 − 2 2<br />

b 3 − 2b 2 + b 2 1 <br />

p + = 2 − b + z − 1 + bz −<br />

z + − z − z + − b<br />

p − = 2 − b + z + 1 + bz +<br />

z − − z + z − − b<br />

1 − p + − p −<br />

Where = 1 − b 4 − b 3 − 2 − b 1 3 − b 2 2 − 4 1 − b 3 − b 2 − 2 −<br />

b 1 2 2 − b 1 4 − b 3 − 3 − b 2 2 <br />

Approximations Based on First Moments<br />

A given random variable X (think <strong>of</strong> the risk parameter i in credibility applications) with<br />

known moments k , k = 1 2, can be approximated either by X s<br />

min<br />

or Xs max involved in<br />

(3.8). An alternative approximation mixing these two extremal variables is also available.<br />

Let us denote as<br />

and<br />

s<br />

=<br />

s =<br />

min<br />

X∈ s 0b<br />

max<br />

X∈ s 0b<br />

EX s = E [ X s<br />

min s]<br />

EX s = E [ X s<br />

max s]<br />

the lower and upper bounds involved in (3.8). Explicit expressions <strong>of</strong> s<br />

and <strong>of</strong> s for s up<br />

to five are listed in Table 3.7, in the notation <strong>of</strong> Tables 3.5–3.6. The quantity s − s<br />

can<br />

be considered as the ‘width’ <strong>of</strong> s 0 b as explained in Denuit (2002).


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 139<br />

Table 3.7<br />

Lower s<br />

and upper s bounds in (3.8) for s = 2to5.<br />

s s<br />

s<br />

2 2 1<br />

b 1<br />

3<br />

4<br />

2 2<br />

( ) b1 − 3<br />

2 b − 1 2<br />

1 b − 1 b − 1 2 + + 2<br />

2 b3<br />

b − 1 2 + 2<br />

(<br />

1 − )<br />

1 − r −<br />

r− 4 r + − r + ( )<br />

1 − r −<br />

3 − b 4<br />

r+ 4 p 2<br />

− r + − r 1 + p<br />

− 2 − b 2 b 4<br />

1<br />

5 q + t 5 + + q −t 5 −<br />

p + z 5 + + p −z 5 − + 1 − p + − p − b 5<br />

Now, let X ∈ s 0 b . The sth canonical moment <strong>of</strong> X, denoted as c s X, is given<br />

by<br />

c s X = EXs − s<br />

<br />

s − s<br />

Thus, c s X is simply the position <strong>of</strong> the sth moment <strong>of</strong> X relative to its possible range.<br />

Now, c s X gives a good indication <strong>of</strong> the ‘position’ <strong>of</strong> X in s 0 b with respect to<br />

the extrema X s<br />

min<br />

and Xs max. Therefore, we might expect that a satisfactory approximation<br />

<strong>of</strong> X ∈ s 0 b is furnished by a convex combination <strong>of</strong> the stochastic extrema in<br />

s−1 0 b with weights depending on the s − 1th canonical moment c s−1 X, i.e. we<br />

use the mixture<br />

˜X s =<br />

{<br />

X<br />

s−1<br />

min<br />

X s−1<br />

max<br />

with probability 1 − c s−1 X<br />

with probability c s−1 X<br />

(3.9)<br />

in order to approximate X. In the following, we refer to ˜X s as the sth canonical approximation<br />

<strong>of</strong> X. It is easily seen that ˜X s ∈ s 0 b .<br />

The Unimodal Case<br />

A purely discrete approximation to the risk parameter i causes problems when an experience<br />

rating plan has to be designed, as shown in Walhin & Paris (1999). The a posteriori<br />

corrections obtained with discrete i s exhibit plateaus before and after sudden jumps,<br />

which is commercially unacceptable. When F only has a few support points, a ‘block’<br />

structure is clearly apparent for the credibility coefficients, each block with almost constant<br />

a posteriori corrections corresponding to one support point <strong>of</strong> F . The policyholder is<br />

transferred from smaller to larger mass points as more claims are filed. In order to avoid<br />

this, we need a smooth risk distribution. Therefore, we would like to have simple continuous<br />

approximations to the risk parameter. This can be done in the unimodal case, as shown<br />

below.<br />

A situation <strong>of</strong> practical interest in actuarial sciences is when the random variables under<br />

consideration are known to have a unimodal distribution with a given mode m, together<br />

with the fixed moments 1 2 s−1 . The Gamma, LogNormal and Inverse Gaussian


140 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

mixing distributions examined in Chapter 2 are all unimodal. Henceforth, we denote by<br />

s 0 b m − unim the unimodal moment space <strong>of</strong> all the random variables <strong>of</strong> this type.<br />

In the following, we use the notation nim z, with m, z ∈ , for the Uniform<br />

distribution on the interval minm z maxm z: ifmz, it is the law<br />

with constant probability density function equal to 1/m − z on z m; ifm = z, itisthe<br />

law degenerated at the point m. Given some random variable Z, we denote by nim Z<br />

the mixed Uniform distribution with random extremal point Z as mixing parameter.<br />

A convenient representation <strong>of</strong> unimodal distributions is provided by Khinchine’s theorem<br />

(see, e.g., Theorem 1.3 in Dharmadhikari & Joag-Dev (1988)). This theorem states that a<br />

random variable Y has a unimodal law with a mode at 0 if, and only if, Y is distributed as<br />

UZ where U and Z are two independent random variables and U ∼ ni0 1. This condition<br />

can be rewritten as Y ∼ ni0Z for some random variable Z.<br />

Now, let X be any random variable valued in 0b and with a unique mode at m. By<br />

Khinchine’s theorem,<br />

X ∼ nim˜Z where ˜Z = m + Z (3.10)<br />

Note that Z is valued in −m b − m. Moreover, the moments j <strong>of</strong> X and ˜ j <strong>of</strong> ˜Z are<br />

linked by simple relations. Indeed, we have<br />

∫ ( 1<br />

∫ m<br />

)<br />

j =<br />

x j dx dF˜Z<br />

m − z<br />

z<br />

x=z<br />

∫<br />

m j+1 − z j+1<br />

=<br />

dF˜Z z<br />

m − zj + 1<br />

= 1 ∫<br />

m j + m j−1 z + m j−2 z 2 +···+z j dF˜Z<br />

j + 1<br />

z<br />

<br />

= 1<br />

j + 1 mj + m j−1˜ 1 + m j−2˜ 2 +···+˜ j j = 1 2 (3.11)<br />

From (3.11), we get<br />

˜ j = j + 1 j − mj j−1 j = 1 2 (3.12)<br />

Let us now come back to (3.8) in s 0 b m − unim . Specifically, we would like to<br />

determine the random variables X s⋆<br />

min<br />

and Xs⋆ max such that the inequalities<br />

EX s⋆<br />

min s ≤ EX s ≤ EX s⋆<br />

max s hold for all X ∈ s 0 b m − unim (3.13)<br />

As shown in Denuit, De Vylder & Lefèvre (1999), the random variables X s⋆<br />

min<br />

and Xs⋆ max<br />

involved in (3.13) give bounds on EX for every s-convex function . Finding the<br />

solution <strong>of</strong> (3.13) for X in s 0 b m − unim thus amounts to solving the corresponding<br />

problem in s 0 b ˜.<br />

Tables 3.8–3.9 give explicit expressions for the improved extremal distributions, also for<br />

values <strong>of</strong> s up to five. In these tables ∑ k<br />

i=1 p ini i i ,0≤ p i ≤ 1, i i ∈ , i = 1 2k,<br />

represents a mixture <strong>of</strong> the distributions ni i i , with respective weights p i .


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 141<br />

Table 3.8 Probability distribution <strong>of</strong> X s⋆<br />

min ∈ s0 b m − unim <br />

achieving the lower bound in (3.13), s = 15.<br />

s<br />

Distributions<br />

1 ni0m<br />

2 nim ˜ 1 <br />

3<br />

˜ 2 −˜ 2 1<br />

ni0m+ ˜2 1<br />

nim ˜<br />

˜ 2 ˜ 2 / ˜ 1 <br />

2<br />

˜<br />

4<br />

1 −˜r −<br />

nim ˜r<br />

˜r + −˜r + + ( 1 − ˜ 1 −˜r −<br />

)<br />

nim ˜r− <br />

− ˜r + −˜r −<br />

5 1 −˜q + −˜q − ni0m+˜q + nim ˜t + +˜q − nim ˜t − <br />

Here ˜r ± , ˜t ± and ˜q ± are those from Table 3.5, with the ˜ j s substituted for the j s.<br />

Table 3.9 Probability distribution <strong>of</strong> Xmax s⋆ ∈ s 0 b m − unim achieving<br />

the upper bound in (3.13), s = 15.<br />

s<br />

Distribution<br />

1 nim b<br />

2<br />

3<br />

b −˜ 1<br />

ni0m+ ˜ 1<br />

nim b<br />

b<br />

b<br />

b −˜ 1 2<br />

ni [ m b ˜ 1 −˜ 2<br />

] ˜ + 2 −˜ 2 1<br />

nim b<br />

b −˜ 1 2 +˜ 2 −˜ 2 1<br />

b −˜ 1 b −˜ 1 2 +˜ 2 −˜ 2 1<br />

4 1 −˜p 1 −˜p 2 ni0m+˜p 1 ni [ m ˜ 3 − b ˜ 2<br />

]<br />

+˜p2 nim b<br />

˜ 2 − b ˜ 1<br />

5 ˜p + nim ˜z + +˜p − nim ˜z − + 1 −˜p + −˜p − nim b<br />

Here ˜p 1 , ˜p 2 , ˜z ± and ˜p ± are those from Table 3.6, with the ˜ j s substituted for the j s.<br />

Application to <strong>Credibility</strong> Formulas<br />

Considering a risk parameter i with moments 1 s−1 and support 0b, b possibly<br />

infinite, the idea is now to approximate i by either i<br />

min or i<br />

max according to the form<br />

<strong>of</strong> the support <strong>of</strong> i . This amounts to replacing the general formulas (3.2) with the simpler<br />

(3.6)–(3.7) where the support points 1 q are those given in Tables 3.5–3.6. Note that<br />

(3.2) can be rewritten as<br />

E i N i1 = k 1 N iTi = k Ti<br />

= Eexp− i• i k •+1<br />

i <br />

Eexp− i• i k •<br />

i <br />

so that we need to approximate the Mellin transform<br />

for any integer k and real >0.<br />

Mk = Eexp− i k i


142 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

When the risk parameter i is known to be unimodal (with mode m, say) then the<br />

improved extremal distributions <strong>of</strong> Tables 3.8–3.9 can be used in lieu <strong>of</strong> those coming from<br />

Tables 3.5–3.6. This amounts to evaluating the numerator and denominator <strong>of</strong> (3.2), taking<br />

for the structure function F a discrete mixture <strong>of</strong> uniform distributions. This provides an<br />

easy-to-compute approximation for (3.2) based on incomplete Gamma functions, as can be<br />

seen from the following example.<br />

Example 3.1 Assume that i has support + , mode m and moments 1 and 2 . Then, we<br />

can use the approximation Mk ≈ M 3<br />

mink where<br />

M 3<br />

min k = 3 2 + 2 1 m − m 2 − 4 2 1<br />

1<br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

m<br />

∫ m<br />

m − 2<br />

+<br />

1 2<br />

1<br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

c − c<br />

0<br />

exp− k d<br />

∫ c<br />

c<br />

exp− k d<br />

with c and c defined as<br />

and<br />

(<br />

c = min m 3 )<br />

2 − 2m 1<br />

2 1 − m<br />

(<br />

c = max m 3 )<br />

2 − 2m 1<br />

<br />

2 1 − m<br />

The value <strong>of</strong> M 3<br />

mink is easily obtained from the incomplete Gamma function. Specifically,<br />

M 3<br />

min k = 3 2 + 2 1 m − m 2 − 4 2 1<br />

k!<br />

k + 1 m<br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

mk+1 m − 2<br />

+<br />

1 2<br />

k! ( )<br />

k + 1 c− k + 1c <br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

c − c k+1<br />

Example 3.2 Now, if the support <strong>of</strong> i is known to be contained in 0b then we can use<br />

the approximation Mk ≈ Mmaxk 3 where<br />

M 3<br />

max k = 3 2 + 2 1 m − m 2 − 4 2 1<br />

1<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

b − m<br />

b + m − 2<br />

+<br />

1 2<br />

1<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1 d − d<br />

∫ b<br />

m<br />

∫ d<br />

d<br />

exp− k d<br />

exp− k d<br />

with d and d defined as<br />

(<br />

d = min m 2b )<br />

1 + 2 1 m − bm − 3 2<br />

b + m − 2 1


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 143<br />

and<br />

(<br />

d = max m 2b )<br />

1 + 2 1 m − bm − 3 2<br />

<br />

b + m − 2 1<br />

As above, this expression can be simplified with the help <strong>of</strong> the incomplete Gamma function<br />

as<br />

M 3<br />

max k = 3 2 + 2 1 m − m 2 − 4 2 1<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

k! ( )<br />

×<br />

k + 1 b − k + 1 m<br />

b − m k+1<br />

b + m − 2<br />

+<br />

1 2<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

k! ( )<br />

×<br />

k + 1 d− k + 1d <br />

d − d k+1<br />

Example 3.3 If the third moment <strong>of</strong> i , 3 say, is known then we can use a mixture <strong>of</strong> the<br />

two preceding approximations, with weights defined by the canonical moments as suggested<br />

by (3.9). To this end, note that<br />

and<br />

⋆ 3 =<br />

⋆ 3 =<br />

min<br />

X∈ 3 0bm−unim 1 2 <br />

EX 3 <br />

3<br />

=<br />

2 + 2 1 m − m 2 − 4 2 1<br />

m 3<br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

3<br />

m − 2<br />

+<br />

1 2<br />

c 4 − c 4<br />

m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

c − c<br />

max<br />

X∈ 3 0bm−unim 1 2 <br />

EX 3 <br />

3<br />

=<br />

2 + 2 1 m − m 2 − 4 2 1<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

b + m − 2<br />

+<br />

1 2<br />

b + m − 2 1 2 + 3 2 + 2 1 m − m 2 − 4 2 1<br />

The approximation to the Mellin transform is then as follows<br />

b 4 − m 4<br />

b − m<br />

d 4 − d 4<br />

d − d <br />

Mk ≈ ⋆ 3 − 3<br />

⋆ 3 − M 3<br />

⋆ 3<br />

min k + 3 − ⋆ 3<br />

⋆ 3 − ⋆ 3<br />

M 3<br />

max k


144 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Numerical Illustration<br />

Since we worked with mixing distributions with unbounded support, we cannot use the<br />

maximal variables described above. In practice, setting b equal to a high quantile (99.99 %,<br />

say) <strong>of</strong> the mixing distribution is expected to give good results.<br />

Let us consider the Negative Binomial model for the annual claim number in Portfolio A.<br />

In this case, i ∼ ama a, with estimated parameter â = 1065. The estimated moments<br />

<strong>of</strong> i are<br />

̂ 1 = 1<br />

̂ 2 = â + 1 = 1939<br />

â<br />

â + 1â + 2<br />

̂ 3 = = 5580<br />

̂ 4 =<br />

â 2<br />

â + 1â + 2â + 3<br />

â 3<br />

= 21299<br />

Applying formula (3.7) with discrete approximation given by Table 3.5, we can approximate<br />

i with the help <strong>of</strong> the 4-convex minimum with two support points: r + = 32883 with<br />

probability 01521 and r − = 05897 with probability 08479, or with the 5-convex minimum<br />

with three support points: t + = 45218 with probability 00474 and t − = 12341 with<br />

probability 06366 and 0 with probability 03160. The accuracy <strong>of</strong> these approximations<br />

decreases with T i and k • . Discrete approximations are useful for short claim history and<br />

rather good driving record. Since the Gamma distribution (playing the role <strong>of</strong> the mixing<br />

distribution in the Negative Binomial model) has a unimodal probability density function,<br />

we can also use a mixture <strong>of</strong> uniform distributions to approximate i in Portfolio A. It turns<br />

out that the approximation is satisfactory, except for high values <strong>of</strong> k • . Again, this is due to<br />

the fact that the approximation uses s-convex minima that underestimate the riskiness <strong>of</strong> the<br />

worse drivers.<br />

To get accurate approximations we need to use a mixture (3.9) <strong>of</strong> the improved 4-convex<br />

extrema in the unimodal case taking for b the 99.99th quantile <strong>of</strong> the Gamma distribution.<br />

This gives the results displayed in Table 3.10 for a good driver, in Table 3.11 for an average<br />

driver and in Table 3.12 for a bad driver. We can see that the approximations are now very<br />

satisfactory for the vast majority <strong>of</strong> the combinations T i –k • .<br />

3.3.9 Linear <strong>Credibility</strong><br />

Bayesian statistics <strong>of</strong>fer an intellectually acceptable approach to credibility theory. Bayes<br />

revision E i N i• <strong>of</strong> the heterogeneity component is theoretically very satisfying but is<br />

<strong>of</strong>ten difficult to compute (except for conjugate distributions or discrete approximations).<br />

Practical applications involve numerical methods to perform integration with respect to a<br />

posteriori distributions, making more elementary approaches desirable (at least to get a first<br />

easy-to-compute approximation <strong>of</strong> the result). Because we have observed N i1 N iTi , one<br />

suggestion is to approximate E i N i1 N iTi by a linear function <strong>of</strong> the N it s. Basically,<br />

the actuary still resorts to a quadratic loss function but the shape <strong>of</strong> the credibility predictor


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 145<br />

Table 3.10 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i and<br />

number <strong>of</strong> past claims k • for a good driver from Portfolio A (expected annual claim frequency <strong>of</strong><br />

9.28 %) and for a mixture <strong>of</strong> mixed uniform approximations <strong>of</strong> i (improved 4-convex extrema).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 91.98 % 178.36 % 264.74 % 351.04 % 437.76 % 524.92 %<br />

2 85.16 % 165.12 % 245.13 % 325.04 % 404.57 % 486.88 %<br />

3 79.28 % 153.70 % 228.23 % 302.76 % 376.26 % 451.61 %<br />

4 74.16 % 143.77 % 213.51 % 283.40 % 352.14 % 420.74 %<br />

5 69.66 % 135.06 % 200.54 % 266.38 % 331.27 % 394.52 %<br />

6 65.67 % 127.37 % 189.00 % 251.28 % 312.83 % 372.29 %<br />

7 62.18 % 120.55 % 178.65 % 237.81 % 296.27 % 353.06 %<br />

8 58.93 % 114.47 % 169.29 % 225.73 % 281.26 % 335.95 %<br />

9 56.05 % 109.04 % 160.80 % 214.82 % 267.62 % 320.35 %<br />

10 53.43 % 104.17 % 153.05 % 204.90 % 255.23 % 305.90 %<br />

Table 3.11 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i and<br />

number <strong>of</strong> past claims k • for an average driver from Portfolio A (expected annual claim frequency<br />

<strong>of</strong> 14.09 %) and for a mixture <strong>of</strong> mixed uniform approximations <strong>of</strong> i (improved 4-convex<br />

extrema).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 88.32 % 171.25 % 254.21 % 337.07 % 419.97 % 505.05 %<br />

2 79.09 % 153.34 % 227.69 % 302.05 % 375.37 % 450.47 %<br />

3 71.60 % 138.83 % 206.16 % 273.75 % 340.28 % 405.73 %<br />

4 65.41 % 126.87 % 188.25 % 250.30 % 311.63 % 370.88 %<br />

5 60.21 % 116.90 % 173.05 % 230.58 % 287.30 % 342.82 %<br />

6 55.76 % 108.51 % 159.96 % 213.75 % 266.28 % 318.81 %<br />

7 51.93 % 101.38 % 148.58 % 199.16 % 248.12 % 297.44 %<br />

8 48.58 % 95.27 % 138.64 % 186.29 % 232.40 % 278.28 %<br />

9 45.63 % 89.96 % 129.95 % 174.75 % 218.71 % 261.30 %<br />

10 43.01 % 85.30 % 122.35 % 164.32 % 206.58 % 246.42 %<br />

is constrained ex ante to be linear in past observations, i.e. the predictor ˆN iTi +1 <strong>of</strong> N iTi +1 is<br />

<strong>of</strong> the form<br />

T<br />

∑ i<br />

ˆN iTi +1 = c i0 + c it N it<br />

The coefficients c i0 and the c it s involved in ˆN iTi +1 are obtained from the minimization <strong>of</strong> an<br />

expected squared difference.<br />

t=1


146 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 3.12 Values <strong>of</strong> E i N i• = k • for different combinations <strong>of</strong> observed periods T i and<br />

number <strong>of</strong> past claims k • for a bad driver from Portfolio A (expected annual claim frequency<br />

<strong>of</strong> 28.40 %) and for a mixture <strong>of</strong> mixed uniform approximations <strong>of</strong> i (improved 4-convex<br />

extrema).<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 78.95 % 15307 % 227.29 % 301.52 % 374.70 % 449.61 %<br />

2 65.22 % 12650 % 187.69 % 249.57 % 310.74 % 369.83 %<br />

3 55.55 % 10812 % 159.34 % 212.96 % 265.29 % 317.66 %<br />

4 48.36 % 9488 % 138.01 % 185.46 % 231.41 % 277.06 %<br />

5 42.80 % 8493 % 121.75 % 163.48 % 205.61 % 245.24 %<br />

6 38.39 % 7709 % 109.33 % 145.45 % 184.61 % 220.87 %<br />

7 34.79 % 7065 % 99.70 % 130.79 % 166.57 % 201.26 %<br />

8 31.83 % 6518 % 92.05 % 119.03 % 150.92 % 184.27 %<br />

9 29.35 % 6042 % 85.75 % 109.64 % 137.59 % 168.95 %<br />

10 27.24 % 5622 % 80.40 % 102.06 % 126.49 % 155.15 %<br />

Specifically, we look for c i0 and c it s, t = 1T i , such that the expected square difference<br />

between N iTi +1 and its prediction ˆN iTi +1 is minimum, i.e. such that<br />

c = arg min 1 <br />

c<br />

where<br />

⎡(<br />

) ⎤<br />

T 2<br />

∑ i<br />

1 = E ⎣ N iTi +1 − c i0 − c it N it<br />

⎦ <br />

t=1<br />

Alternatively, it can be shown that c also solves<br />

c = arg min j j = 2 3<br />

c<br />

where<br />

⎡(<br />

) ⎤<br />

T 2<br />

∑ i<br />

2 = E ⎣ EN iTi +1 i − c i0 − c it N it<br />

⎦<br />

⎡(<br />

) ⎤<br />

T 2<br />

∑ i<br />

3 = E ⎣ EN iTi +1N i1 N iTi − c i0 − c it N it<br />

⎦ <br />

t=1<br />

t=1<br />

Let us show for instance that arg min c 1 = arg min c 2 . To this end, let us write<br />

⎡<br />

(<br />

)) ⎤<br />

( )<br />

T 2<br />

∑ i<br />

1 = E ⎣(<br />

N iTi +1 − EN iTi +1 i + EN iTi +1 i − c i0 − c it N it<br />

⎦<br />

t=1


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 147<br />

and let us expand the squared sum to get<br />

[ ( ) ] 2<br />

1 = E N iTi +1 − EN iTi +1 i <br />

[ (<br />

+ 2E N iTi +1 − EN iTi +1 i ) ( )]<br />

T<br />

∑ i<br />

EN iTi +1 i − c i0 − c it N it + 2 <br />

The second term in the expansion <strong>of</strong> 1 vanishes, and the first one does not depend on the<br />

c it s.<br />

Let us determine c as arg min 2 . Recall that EN iTi +1 i = iTi +1 i . Setting equal to 0<br />

the partial derivative <strong>of</strong> 2 with respect to c i0 allows us to write<br />

which gives<br />

T<br />

∑ i<br />

0 = iTi +1E i − c i0 − c it EN it <br />

t=1<br />

T<br />

∑ i<br />

c i0 = iTi +1 − c it it (3.14)<br />

Now, setting equal to 0 the partial derivatives <strong>of</strong> 2 with respect to c is for s = 1 2T i ,<br />

yields<br />

Noting that<br />

t=1<br />

T<br />

∑ i<br />

0 = iTi +1EN is i − c i0 EN is − c it EN is N it (3.15)<br />

EN it N is = CN is N it + EN is EN it <br />

[<br />

] [<br />

]<br />

= E CN is N it i + C EN is i EN it i + is it<br />

⎧<br />

⎨ is + ( ) 21<br />

is + Vi if s = t<br />

=<br />

⎩<br />

is it 1 + V i otherwise<br />

t=1<br />

t=1<br />

and that<br />

[<br />

]<br />

]<br />

EN is i = E CN is i i + C<br />

[EN is i i + is<br />

= is 1 + V i <br />

allows us to cast equation (3.15) into the form<br />

(<br />

)<br />

T<br />

∑ i<br />

c is = V i iTi +1 − c it it = V i c i0 <br />

t=1


148 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Hence, the value <strong>of</strong> c is does not depend on s and the same weight is given to all the past<br />

annual claim numbers. Now, inserting the value <strong>of</strong> c is that we just obtained in (3.14) finally<br />

gives<br />

which in turn yields.<br />

c i0 =<br />

c is =<br />

iTi +1<br />

1 + V i ∑ T i<br />

t=1 it<br />

iTi +1V i <br />

1 + V i ∑ T i<br />

t=1 <br />

it<br />

The expected claim frequency for year T i + 1 given past claims history is<br />

̂N iTi +1 = iTi +1<br />

1 + V i N i•<br />

1 + V i i•<br />

(3.16)<br />

where N i• and i• have been defined in (3.1). Note that ̂N iTi +1 is the best linear predictor<br />

<strong>of</strong> each <strong>of</strong> the true unknown means iTi +1 i , <strong>of</strong> the Bayesian credibility premium<br />

EN iTi +1N i1 N iTi and <strong>of</strong> the number <strong>of</strong> claims N iTi +1 for year T i + 1.<br />

The linear predictor for year T i + 1 thus appears as the product <strong>of</strong> the a priori<br />

expected claim frequency, iTi +1, times an approximation <strong>of</strong> the theoretical correction<br />

E i N i1 N iTi . This approximation possesses a particularly simple interpretation since<br />

it entails a malus when N i• > i• , that is, if the policyholder reported more claims than<br />

expected a priori.<br />

Remark 3.2 Note that (3.16) agrees with the result obtained in the Poisson-Gamma case.<br />

This is because the Bayesian credibility premium is linear in the past observations in the<br />

Poisson-Gamma case. The term exact credibility is used to describe the situation where<br />

the linear credibility premium equals the Bayesian one. Intuitively speaking, using linear<br />

credibility formulas in the mixed Poisson model boils down to approximating the mixing<br />

distribution with the Gamma distribution (or, equivalently, the distribution <strong>of</strong> the claim<br />

numbers with the Negative Binomial one).<br />

Remark 3.3 Note that ̂N iTi +1 could have been obtained by a direct application <strong>of</strong> the<br />

Bühlmann–Straub formula. Let us consider a sequence <strong>of</strong> random variables X 1 X 2 X 3 <br />

such that, given a random variable , the X t s are independent, with finite first and second<br />

moments<br />

Now, define M 2 and 2 as<br />

= EX t and EX t = = E<br />

E [ VX t ] = 2<br />

w t<br />

and V [ EX t ] = M 2


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 149<br />

and put<br />

w • =<br />

n∑<br />

w t and ˜X n = 1 n∑<br />

w<br />

w j X j <br />

•<br />

Clearly, ˜X n is the weighted average <strong>of</strong> the X j s. The minimum <strong>of</strong><br />

[ (<br />

E − a − b˜X n) ] 2<br />

t=1<br />

t=1<br />

on all the couples a b is obtained for<br />

a =<br />

2<br />

2 + w • M 2 and b = w •M 2<br />

2 + w • M 2 <br />

To apply this general result to the Poisson case, we define X j = N ij / ij which gives i =<br />

i , = 1, w j = ij , 2 = 1, M 2 = V i . The best linear approximation to i is then given<br />

by ̂N iTi +1/ iTi +1.<br />

3.4 <strong>Credibility</strong> Formulas with an Exponential Loss Function<br />

3.4.1 Optimal Predictor<br />

This section purposes to describe an alternative approach based on an exponential loss<br />

function. The exponential loss function is asymmetric and possesses one parameter that<br />

reflects the severity <strong>of</strong> the credibility correction. This allows us to s<strong>of</strong>ten the a posteriori<br />

corrections in case <strong>of</strong> claims, keeping the financial balance.<br />

When the new premium amount is fixed by the insurer, two kinds <strong>of</strong> errors may arise: either<br />

the policyholder is undercharged and the insurance company loses its money or the insured is<br />

overcharged and the insurer is at risk <strong>of</strong> losing the policy. In order to penalize large mistakes to<br />

a greater extent, it is usually assumed that the loss function is a nonnegative convex function<br />

<strong>of</strong> the error. The loss is zero when no error is made and strictly positive otherwise. The loss<br />

function is generally taken to be quadratic as in the preceding section. Among other choices<br />

we find also the absolute loss and the 4-degree loss; see, e.g., Lemaire & Vandermeulen<br />

(1983). The problem with these two last losses is that the resulting bonus-malus systems are<br />

unbalanced.<br />

We give here a technical result involving an exponential loss function. It is the analogue<br />

<strong>of</strong> Proposition 3.1.<br />

Proposition 3.2<br />

Under the conditions <strong>of</strong> Proposition 3.1, the minimum <strong>of</strong><br />

[ (<br />

E exp − c ( T+1 − X 1 X 2 X T ))]<br />

on all the measurable functions T → satisfying the constraint<br />

EX 1 X 2 X T = T+1 is obtained for


150 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

⋆⋆ X 1 X 2 X T = T+1 + 1 c<br />

(E [ ln E [ exp−c T+1 X 1 X 2 X T<br />

]]<br />

− ln E [ exp−c T+1 X 1 X 2 X T<br />

] ) <br />

Pro<strong>of</strong><br />

Starting from<br />

[ (<br />

E exp − c ( T+1 − X 1 X 2 X T ))]<br />

[<br />

= E exp cX 1 X 2 X T E [ ] ]<br />

exp−c T+1 X 1 X 2 X T<br />

[ (<br />

)]<br />

= E exp cX 1 X 2 X T − ⋆⋆ X 1 X 2 X T <br />

( [<br />

× expc T+1 exp E ln E [ ] ])<br />

exp−c T+1 X 1 X 2 X T <br />

Now, let us apply Jensen’s inequality to get<br />

[ (<br />

E exp − c ( T+1 − X 1 X 2 X T ))]<br />

( [<br />

])<br />

≥ exp cE X 1 X 2 X T − ⋆⋆ X 1 X 2 X T <br />

( [<br />

expc T+1 exp E ln E [ ] ])<br />

exp−c T+1 X 1 X 2 X T <br />

Because <strong>of</strong> the constraint on the expectation <strong>of</strong> the s, the first exponential is 1, yielding<br />

[ (<br />

E exp − c ( T+1 − X 1 X 2 X T ))]<br />

( [<br />

≥ expc T+1 exp E ln E [ ] ])<br />

exp−c T+1 X 1 X 2 X T<br />

[ (<br />

= E exp − c ( T+1 − ⋆⋆ X 1 X 2 X T ))]<br />

which is the expected result.<br />

□<br />

Remark that in Proposition 3.2 the constraint is made in order to guarantee the financial<br />

equilibrium.<br />

Let us now apply the result contained in Proposition 3.2 to the credibility problem. In this<br />

case,<br />

X t = N it and t i = it i<br />

so that the optimal predictor <strong>of</strong> N iTi +1 for the exponential loss function is <strong>of</strong> the form<br />

⋆⋆ N i1 N iTi = iTi +1 + 1 (E [ ln E [ ]]<br />

exp−c<br />

c<br />

iTi +1 i N i1 N iTi<br />

− ln E [ ] )<br />

exp−c iTi +1 i N i1 N iTi


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 151<br />

3.4.2 Poisson-Gamma <strong>Credibility</strong> Model<br />

Let us now apply the result contained in Proposition 3.2 to the Poisson-Gamma credibility<br />

model. To this end, assume that i ∼ ama a. Given N i• , we know from (3.4) that i<br />

follows the ama + N i• a+ i• distribution, so that we know from (1.36) that<br />

(<br />

)<br />

[ ]<br />

a+Ni•<br />

a + <br />

E exp−c iTi +1 i N i1 N iTi =<br />

i•<br />

<br />

a + i• + c iTi +1<br />

It follows that<br />

and<br />

[ ]<br />

ln E exp−c iTi +1 i N i1 N iTi<br />

(<br />

=−a + N i• ln 1 + c )<br />

iT i +1<br />

a + i•<br />

[<br />

E ln E [ ] ]<br />

exp−c iTi +1 i N i1 N iTi<br />

(<br />

=−a + i• ln 1 + c )<br />

iT i +1<br />

<br />

a + i•<br />

Proposition 3.2 then gives<br />

⋆⋆ k i1 k iTi = iTi +1 + k i• − i•<br />

c<br />

(<br />

ln 1 + c )<br />

iT i +1<br />

(3.17)<br />

a + i•<br />

Considering (3.17), we see that ⋆⋆ k i1 k iTi is equal to the a priori expectation iTi +1 =<br />

EN iTi +1 plus a correction term. This correction is positive, so that<br />

⋆⋆ k i1 k iTi > iTi +1<br />

if k i• > i• , that is, if the policyholder reported more claims than expected. Otherwise, the<br />

correction is negative. As with the quadratic loss function, the penalty is caused by an excess<br />

<strong>of</strong> observed claims over expected ones.<br />

Let us now compare the credibility formulas obtained with a quadratic and exponential<br />

loss function. Since for any c ≥ 0,<br />

(<br />

ln 1 + c )<br />

iT i +1<br />

≤ c iT i +1<br />

<br />

a + i• a + i•<br />

it is easily seen that we have<br />

⋆⋆ k i1 k iTi ≤ ⋆ k i1 k iTi if k i• > i•


152 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

and<br />

⋆⋆ k i1 k iTi ≥ ⋆ k i1 k iTi if k i• < i• <br />

Let us define<br />

exp c = 1 (<br />

c ln 1 + c )<br />

iT i +1<br />

a + i•<br />

to be the weight given to k i• in the a posteriori evaluation (3.17) in the Poisson-<br />

Gamma model. We have that lim c→+ exp c = 0. Moreover, routine calculations show<br />

that d/dc exp c < 0, so that the weight given to the observed average claim number<br />

decreases as c increases. This provides an intuitive meaning <strong>of</strong> the parameter c: if c<br />

increases, then the a posteriori merit-rating scheme becomes less severe, and at the limit<br />

for c →+, the premium no longer depends on the incurred claims. If the asymmetry<br />

factor c tends to + then all the risks within the same tariff class pay the same premium:<br />

there is no longer an experience rating. Conversely, the weight given to past claims<br />

under an exponential loss function tends to the weight under a quadratic loss function<br />

as c → 0.<br />

3.4.3 Linear <strong>Credibility</strong><br />

Another possibility is to determine a linear credibility premium based on an exponential<br />

loss function, by considering predictors <strong>of</strong> the form b 0 + ∑ T i<br />

t=1 b jN it . The b j s minimize the<br />

Lagrangian function<br />

[<br />

b 0 b 1 ··· b t = E exp ( − c iTi +1 i − b 1 N i1 − b 2 N i2 −···−b Ti<br />

N iTi − b 0 )]<br />

[<br />

]<br />

− E b 0 + b 1 N i1 + b 2 N i2 +···+b Ti<br />

N iTi − iTi +1 <br />

Setting to 0 the derivatives <strong>of</strong> with respect to , b 0 b 1 b Ti<br />

in the Poisson-Gamma case, i.e. (3.17).<br />

yields the same result as<br />

3.4.4 Numerical Illustration<br />

Let us now illustrate the use <strong>of</strong> the exponential loss function in credibility. To this end, let us<br />

consider the Negative Binomial fit to Portfolio A described in Table 2.7. Thus, i is taken<br />

to be ama a distributed, with estimated parameter â = 1065.<br />

Formula (3.17) allows us to compute the a posteriori correction as a function <strong>of</strong> the<br />

number T i <strong>of</strong> coverage periods and <strong>of</strong> the total number <strong>of</strong> claims k • filed to the company.<br />

The results obtained with c = 1 are displayed in Table 3.13 for a good driver, in Table 3.14<br />

for an average driver, and in Table 3.15 for a bad driver. Tables 3.16–3.18 are the analogues<br />

for c = 5.<br />

Let us first compare the a posteriori corrections listed in Table 3.13 with those <strong>of</strong><br />

Table 3.2 corresponding to a quadratic loss function (i.e. to c = 0). We see that the<br />

application <strong>of</strong> an exponential loss function slightly reduces the penalties in case <strong>of</strong> claims


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 153<br />

Table 3.13 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for different<br />

combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for a good driver<br />

(expected annual claim frequency equal to 9.28 %) from Portfolio A, with c = 1.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 92.3 % 175.4 % 258.5 % 341.5 % 424.6 % 507.7 %<br />

2 85.7 % 162.8 % 240.0 % 317.1 % 394.2 % 471.4 %<br />

3 80.0 % 151.9 % 223.9 % 295.9 % 367.9 % 439.9 %<br />

4 75.0 % 142.4 % 209.9 % 277.4 % 344.8 % 412.3 %<br />

5 70.5 % 134.0 % 197.5 % 261.0 % 324.5 % 388.0 %<br />

6 66.6 % 126.6 % 186.5 % 246.5 % 306.5 % 366.4 %<br />

7 63.1 % 119.9 % 176.7 % 233.5 % 290.3 % 347.1 %<br />

8 59.9 % 113.9 % 167.9 % 221.8 % 275.8 % 329.7 %<br />

9 57.1 % 108.5 % 159.8 % 211.2 % 262.6 % 314.0 %<br />

10 54.5 % 103.5 % 152.6 % 201.6 % 250.7 % 299.7 %<br />

Table 3.14 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for different<br />

combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for an average driver<br />

(expected annual claim frequency equal to 14.09 %) from Portfolio A, with c = 1.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 89.0 % 1674 % 245.8 % 324.3 % 402.7 % 481.1 %<br />

2 80.1 % 1507 % 221.4 % 292.0 % 362.6 % 433.2 %<br />

3 72.9 % 1371 % 201.3 % 265.5 % 329.8 % 394.0 %<br />

4 66.8 % 1257 % 184.6 % 243.5 % 302.4 % 361.3 %<br />

5 61.7 % 1161 % 170.5 % 224.9 % 279.2 % 333.6 %<br />

6 57.3 % 1078 % 158.3 % 208.9 % 259.4 % 309.9 %<br />

7 53.5 % 1007 % 147.8 % 195.0 % 242.1 % 289.3 %<br />

8 50.2 % 944 % 138.6 % 182.8 % 227.0 % 271.3 %<br />

9 47.2 % 889 % 130.5 % 172.1 % 213.7 % 255.4 %<br />

10 44.6 % 839 % 123.3 % 162.6 % 201.9 % 241.2 %<br />

(the values listed in the columns entitled k • = 1 to 5 are smaller in Table 3.13 compared<br />

to Table 3.2). Since the financial balance is fulfilled by the credibility premiums obtained<br />

with an exponential loss function, the discounts for claim-free policyholders are also<br />

reduced (the values in the column entitled k • = 0 are higher in Table 3.13 compared to<br />

Table 3.2).<br />

Note however that the a posteriori corrections obtained with an exponential loss function<br />

with c = 1 are very similar to those coming from a quadratic loss function. To see this, let<br />

us now increase the value <strong>of</strong> c to 5 in Table 3.16. Increasing c results in reduced discounts<br />

and also in reduced penalties.


154 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 3.15 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for<br />

different combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for a<br />

bad driver (expected annual claim frequency equal to 28.40 %) from Portfolio A,<br />

with c = 1.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 80.9 % 1482 % 2154 % 282.7 % 350.0 % 417.2 %<br />

2 67.9 % 1244 % 1808 % 237.3 % 293.8 % 350.2 %<br />

3 58.6 % 1072 % 1558 % 204.5 % 253.1 % 301.8 %<br />

4 51.5 % 942 % 1369 % 179.6 % 222.4 % 265.1 %<br />

5 45.9 % 840 % 1221 % 160.2 % 198.3 % 236.4 %<br />

6 41.4 % 758 % 1102 % 144.5 % 178.9 % 213.3 %<br />

7 37.7 % 691 % 1004 % 131.7 % 163.0 % 194.3 %<br />

8 34.7 % 634% 922 % 120.9 % 149.7 % 178.4 %<br />

9 32.0 % 586% 852 % 111.8 % 138.4 % 165.0 %<br />

10 29.8 % 545% 792 % 103.9 % 128.7 % 153.4 %<br />

Table 3.16 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for different<br />

combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for a good driver<br />

(expected annual claim frequency equal to 9.28 %) from Portfolio A, with c = 5.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 93.3 % 165.9 % 238.5 % 311.1 % 383.8 % 456.4 %<br />

2 87.4 % 155.4 % 223.4 % 291.4 % 359.4 % 427.4 %<br />

3 82.2 % 146.1 % 210.1 % 274.0 % 338.0 % 401.9 %<br />

4 77.6 % 137.9 % 198.3 % 258.6 % 318.9 % 379.3 %<br />

5 73.5 % 130.6 % 187.7 % 244.9 % 302.0 % 359.1 %<br />

6 69.8 % 124.0 % 178.3 % 232.5 % 286.7 % 340.9 %<br />

7 66.5 % 118.1 % 169.7 % 221.3 % 272.9 % 324.6 %<br />

8 63.4 % 112.7 % 161.9 % 211.2 % 260.4 % 309.7 %<br />

9 60.7 % 107.8 % 154.8 % 201.9 % 249.0 % 296.1 %<br />

10 58.1 % 103.2 % 148.4 % 193.5 % 238.6 % 283.7 %<br />

If we compare the a posteriori corrections for the different types <strong>of</strong> drivers, we see<br />

that the discounts increase with the average annual claim frequency, as was the case with<br />

the quadratic loss function. Also, the penalties appear to decrease with the average annual<br />

claim frequencies. The fact that a priori bad drivers need a greater premium reduction<br />

when no claim is filed to the insurance company thus remains with exponential loss<br />

functions.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 155<br />

Table 3.17 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for different<br />

combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for an average driver<br />

(expected annual claim frequency equal to 14.09 %) from Portfolio A, with c = 5.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 90.8 % 1561 % 221.4 % 286.7 % 352.0 % 417.4 %<br />

2 83.2 % 1429 % 202.6 % 262.4 % 322.1 % 381.8 %<br />

3 76.7 % 1318 % 186.8 % 241.9 % 296.9 % 351.9 %<br />

4 71.2 % 1223 % 173.3 % 224.3 % 275.4 % 326.4 %<br />

5 66.5 % 1141 % 161.7 % 209.2 % 256.8 % 304.4 %<br />

6 62.3 % 1069 % 151.5 % 196.0 % 240.6 % 285.2 %<br />

7 58.7 % 1006 % 142.5 % 184.4 % 226.3 % 268.2 %<br />

8 55.4 % 950 % 134.5 % 174.1 % 213.7 % 253.2 %<br />

9 52.5 % 900 % 127.4 % 164.9 % 202.4 % 239.8 %<br />

10 49.9 % 855 % 121.0 % 156.6 % 192.2 % 227.8 %<br />

Table 3.18 Values <strong>of</strong> the a posteriori corrections obtained from (3.17) for different<br />

combinations <strong>of</strong> observed periods T i and number <strong>of</strong> past claims k • for a bad driver<br />

(expected annual claim frequency equal to 28.40 %) from Portfolio A, with c = 5.<br />

T i<br />

Number <strong>of</strong> claims k •<br />

0 1 2 3 4 5<br />

1 85.6 % 1363 % 186.9 % 237.5 % 288.2 % 338.8 %<br />

2 75.0 % 1190 % 163.1 % 207.2 % 251.2 % 295.3 %<br />

3 66.7 % 1058 % 144.8 % 183.8 % 222.9 % 261.9 %<br />

4 60.2 % 952 % 130.3 % 165.3 % 200.4 % 235.5 %<br />

5 54.8 % 866 % 118.5 % 150.3 % 182.1 % 213.9 %<br />

6 50.3 % 795 % 108.6 % 137.8 % 166.9 % 196.1 %<br />

7 46.5 % 734 % 100.3 % 127.2 % 154.1 % 181.0 %<br />

8 43.3 % 682 % 93.2 % 118.2 % 143.1 % 168.1 %<br />

9 40.4 % 637 % 87.0 % 110.3 % 133.6 % 156.9 %<br />

10 38.0 % 598 % 81.6 % 103.5 % 125.3 % 147.2 %<br />

3.5 Dependence in the Mixed Poisson <strong>Credibility</strong> Model<br />

3.5.1 Intuitive Ideas<br />

The main focus <strong>of</strong> this section is to formalize intuitive ideas with the help <strong>of</strong> stochastic<br />

orderings. Every actuary intuitively feels that the a posteriori claim frequency distribution<br />

must become more dangerous as more claims are reported. Here we precisely define ‘more<br />

dangerous’ and explain that the a posteriori premium must increase with the total claim<br />

number in the mixed Poisson model.


156 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

In the model A1–A2 <strong>of</strong> Definition 3.1, we intuitively feel that the following statements<br />

are true:<br />

Statement S1<br />

Statement S2<br />

Statement S3<br />

i ‘increases’ in the past claims N i•<br />

N iTi +1 ‘increases’ in the past claims N i•<br />

N iTi +1 and N i• are ‘positively dependent’.<br />

This section aims to precisely define the meaning <strong>of</strong> ‘increases’ in statements S1 and S2, as<br />

well as the nature <strong>of</strong> the ‘positive dependence’ involved in statement S3. The pro<strong>of</strong>s will be<br />

omitted because <strong>of</strong> their technical nature; for a detailed study, we refer the reader to Denuit<br />

ET AL. (2005, Chapter 7). Note that we present the results in terms <strong>of</strong> stochastic dominance<br />

whereas in fact the stronger (but less intuitive) likelihood ratio order applies.<br />

3.5.2 Stochastic Order Relations<br />

In order to formalize the increasingness involved in statements S1–S2, our study will<br />

extensively resort to stochastic orderings. Therefore, we recall in this section the definition <strong>of</strong><br />

stochastic dominance, as well as some intuitive intepretations. Given two random variables<br />

X and Y , X is said to be smaller than Y in the stochastic dominance, written as X ≼ ST Y ,if<br />

PrX>t≤ PrY>tfor all t ∈ <br />

We see that a ranking in the ≼ ST -sense translates the intuitive meaning <strong>of</strong> ‘being smaller<br />

than’ in probability models: indeed, we compare the probability that both random variables<br />

exceed some given threshold t, and the smallest one in the ≼ ST -sense has the smallest<br />

probability <strong>of</strong> exceeding the threshold. If M and N are two counting random variables then<br />

M ≼ ST N ⇔<br />

+∑<br />

j=k<br />

PrM = j ≤<br />

+∑<br />

j=k<br />

PrN = j for all k = 0 1<br />

One intuitively feels that a random variable N following the Poisson distribution with<br />

mean gets bigger as increases. The next implication formalizes this intuitive statement:<br />

≤ ′ ⇒ N ≼ ST N ′<br />

The oi family thus increases in its parameter in the ≼ ST -sense.<br />

3.5.3 Comparisons <strong>of</strong> Predictive Distributions<br />

We then have the following results that formalize statements S1 and S2. First, i increases<br />

in the past claims N i• in the ≼ ST -sense, that is<br />

i N i• = k ≼ ST i N i• = k ′ for k ≤ k ′ (3.18)<br />

⇔ Pr i >tN i• = k ≤ Pr i >tN i• = k ′ whatever t, provided k ≤ k ′


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 157<br />

This relation is transmitted to the number <strong>of</strong> claims, in the sense that<br />

N iTi +1N i• = k ≼ ST N iTi +1N i• = k ′ for k ≤ k ′ (3.19)<br />

⇔ PrN iTi +1 >jN i• = k ≤ N iTi +1 >jN i• = k ′ whatever j, provided k ≤ k ′ <br />

3.5.4 Positive Dependence Notions<br />

In order to formalize the positive dependence involved in statement S3, we will present<br />

some concepts <strong>of</strong> dependence related to ≼ ST . The study <strong>of</strong> concepts <strong>of</strong> positive dependence<br />

for random variables, started in the late 1960s, has yielded numerous useful results in both<br />

statistical theory and applications. Applications <strong>of</strong> these concepts in actuarial science recently<br />

received increased interest.<br />

Let us formalize the positive dependence existing between the two components <strong>of</strong> a<br />

random couple (i.e. the fact that large values <strong>of</strong> one component tend to be associated with<br />

large values for the other). Formally, let X = X 1 X 2 be a bivariate random vector. Then,<br />

X is positive regression dependent (PRD, for short) if X 2 X 1 = x 1 ≼ ST X 2 X 1 = x<br />

1 ′ for<br />

all x 1 ≤ x<br />

1<br />

′ and X 1X 2 = x 2 ≼ ST X 1 X 2 = x<br />

2 ′ for all x 2 ≤ x<br />

2 ′ . PRD imposes stochastic<br />

increasingness <strong>of</strong> one component <strong>of</strong> the random couple in the value assumed by the other<br />

component in the ≼ ST -sense. This dependence notion is thus rather intuitive.<br />

PRD naturally extends to higher dimension. Specifically, let X = X 1 X n be a<br />

n-dimensional random vector. Then,<br />

(i) X is conditionally increasing (CI, for short) if<br />

X i X j = x j j ∈ J ≼ ST X i X j = x ′ j j ∈ J<br />

whenever x j ≤ x<br />

j ′ , j ∈ J, J ⊂ 1 2nand i ∉ J.<br />

(ii) X is conditionally increasing in sequence (CIS, for short) if X i is stochastically increasing<br />

in X 1 X i−1 , for i ∈ 2ni.e.<br />

X i X 1 = x 1 X i−1 = x i−1 ≼ ST X i X 1 = x ′ 1 X i−1 = x ′ i−1 <br />

whenever x j ≤ x<br />

j ′ j∈ 1i− 1.<br />

The conditional increasingness in sequence is interesting when there is a natural order in the<br />

components <strong>of</strong> X, induced by obervation times for instance.<br />

3.5.5 Dependence Between Annual <strong>Claim</strong> Numbers<br />

The total claim number N i• reported in the past periods and the claim number N iTi +1 for the<br />

next coverage period are PRD. The fact that N i• and N iTi +1 are PRD completes the statement<br />

(3.19). This provides a host <strong>of</strong> useful inequalities. In particular, whatever the distribution <strong>of</strong><br />

i , the credibility coefficient E i N i• = k is increasing in k, which is easily deduced from<br />

(3.18).<br />

Considering the dependence existing between the components <strong>of</strong> N i , i.e. between the N it s,<br />

t = 1 2T i , we can prove that N i is CI.


158 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

3.5.6 Increasingness in the Linear <strong>Credibility</strong> Model<br />

Let ̂N iTi +1 be the predictor (3.16) <strong>of</strong> N iTi +1. It can be shown that N iTi +1 is indeed increasing<br />

in ̂N iTi +1, in the sense that<br />

N iTi +1̂N iTi +1 = p ≼ ST N iTi +1̂N iTi +1 = p ′ whenever p ≤ p ′<br />

⇔ PrN iTi +1 >k̂N iTi +1 = p ≤ PrN iTi +1 >k̂N iTi +1 = p ′ whatever k, provided p ≤ p ′ <br />

This means that the linear credibility premium is indeed a good predictor <strong>of</strong> the future<br />

claim number in model A1–A2. Basically, we prove that increasing the linear credibility<br />

premium (i.e. degrading the claim record <strong>of</strong> the policyholder) makes the probability <strong>of</strong><br />

observing more claims in the future greater.<br />

3.6 Further Reading and Bibliographic Notes<br />

3.6.1 <strong>Credibility</strong> Models<br />

<strong>Credibility</strong> theory began with the papers by Mowbray (1914) and Whitney (1918). These<br />

papers purposed to derive a premium which was a balance between the experience <strong>of</strong> an<br />

individual risk and <strong>of</strong> a class <strong>of</strong> risks. An excellent introduction to credibility theory can be<br />

found, e.g., in Goovaerts & Hoogstad (1987), Herzog (1994), Dannenburg, Kaas &<br />

Goovaerts (1996), Klugman, Panjer & Willmot (2004, Chapter 16) and Bühlmann<br />

& Gisler (2005). See also Norberg (2004) for an overview with useful references and<br />

links to Bayesian statistics and linear estimation. The underlying assumption <strong>of</strong> credibility<br />

theory which sets it apart from formulas based on the individual risk alone is that the risk<br />

parameter is regarded as a random variable. This naturally leads to a Bayesian approach<br />

to credibility theory. The book by Klugman (1992) provides an in-depth treatment <strong>of</strong> the<br />

question. See also the review papers by Makov ET AL. (1996) and Makov (2002). The<br />

connection between credibility formulas and Mellin transforms in the Poisson case has been<br />

established by Albrecht (1984).<br />

In a couple <strong>of</strong> seminal papers, Dionne & Vanasse (1989, 1992) proposed a credibility<br />

model which integrates a priori and a posteriori information on an individual basis. The<br />

unexplained heterogeneity was then modelled by the introduction <strong>of</strong> a latent variable<br />

representing the influence <strong>of</strong> hidden policy characteristics. Taking this random effect to be<br />

Gamma distributed yields the Negative Binomial model for the claim number. An excellent<br />

summary <strong>of</strong> the statistical models that may lead to experience rating in insurance can be<br />

found in Pinquet (2000). The nature <strong>of</strong> serial correlation (endogeneous or exogeneous) is<br />

discussed there.<br />

There are many applications <strong>of</strong> credibility techniques to various branches <strong>of</strong> insurance.<br />

Let us mention a nonstandard one, by Rejesus ET AL. (2006). These authors examined<br />

the feasibility <strong>of</strong> implementing an experience-based premium rate discount in crop<br />

insurance.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 159<br />

3.6.2 <strong>Claim</strong> Count Distributions<br />

Other credibility models for claim counts can be found in the literature, going beyond<br />

the mixed Poisson model studied in this chapter. The model suggested by Shengwang,<br />

Wei & Whitmore (1999) employs the Negative Binomial distribution for the conditional<br />

distribution <strong>of</strong> the annual claim numbers together with a Pareto structure function. Some<br />

credibility models are designed for stratified portfolios. Desjardins, Dionne & Pinquet<br />

(2001) considered fleets <strong>of</strong> vehicles, and used individual characteristics <strong>of</strong> both the vehicles<br />

and the carriers. See also Angers, Desjardins, Dionne & Guertin (2006).<br />

An interesting alternative to the Negative Binomial model can be obtained using the<br />

conditional specification technique introduced by Arnold, Castillo & Sarabia (1999).<br />

The idea is to specify the joint distribution <strong>of</strong> N t through its conditionals. More precisely,<br />

the conditional distribution <strong>of</strong> N t given = is oi for some function + → + ,<br />

and the conditional distribution <strong>of</strong> given N t = k is amk k where · and ·<br />

are two functions mapping to + . For an application <strong>of</strong> the model to experience rating,<br />

see Sarabia, Gomez-Deniz & Vazquez-Polo (2004).<br />

3.6.3 Loss Functions<br />

The quadratic loss function is by far the most widely used in practice. The results with<br />

the exponential loss function are taken from Bermúdez, Denuit & Dhaene (2000). Early<br />

references about the use <strong>of</strong> this kind <strong>of</strong> loss function include Ferreira (1977) and Lemaire<br />

(1979). Morillo & Bermudez (2003) used an exponential loss function in connection with<br />

the Poisson-Inverse Gaussian model.<br />

Other loss functions can be envisaged. Young (1998a) uses a loss function that is a<br />

linear combination <strong>of</strong> a squared-error term and a second-derivative term. The squared-error<br />

term measures the accuracy <strong>of</strong> the estimator, while the second-derivative term constrains the<br />

estimator to be close to linear. See also Young & De Vylder (2000), where the loss function<br />

is a linear combination <strong>of</strong> a squared-error term and a term that encourages the estimator<br />

to be close to constant, especially in the tails <strong>of</strong> the distribution <strong>of</strong> claims, where Young<br />

(1997) noted the difficulty with her semiparametric approach. Young (2000) resorts to a<br />

loss function that can be decomposed into a squared-error term and a term that encourages<br />

the credibility premium to be constant. This author shows that by using this loss function, the<br />

problem <strong>of</strong> upward divergence noted in Young (1997) is reduced. See also Young (1998b).<br />

Young (2000) also provides a simple routine for minimizing the loss function, based on the<br />

discussion <strong>of</strong> De Vylder in Young (1998a).<br />

Adopting the semiparametric model proposed in Young (1997, 2000) but considering that<br />

the piecewise linear function has better characteristics in simplicity and intuition than the<br />

kernel, Huang, Song & Liang (2003) used the piecewise linear function as the estimate <strong>of</strong><br />

the prior distribution and to obtain the estimates for the credibility formula.<br />

3.6.4 <strong>Credibility</strong> and Regression Models<br />

Hachemeister (1975) contributed to the credibility context by introducing a regression<br />

model. Since De Vylder (1985), it has been known that credibility formulas can be<br />

recovered from appropriate (non-)linear statistical regression models. More recently, Nelder


160 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

& Verrall (1997) recognized that credibility theory can be encompassed within the theory <strong>of</strong><br />

hierarchical generalized linear models developed by Lee & Nelder (1996). This extends to<br />

the GLM family the pioneering work by Norberg (1986). The likelihood-based approach can<br />

be carried out using standard statistical packages. All the assumptions underlying the model<br />

can be checked (e.g., using appropriate residual analyses), avoiding dogmatic application<br />

<strong>of</strong> risk theory models. The mean random effects are estimated for each policyholder by<br />

maximizing the hierarchical likelihood. With an appropriate choice for the distribution <strong>of</strong><br />

the random effects and using the canonical link function, the estimate is in the form <strong>of</strong> a<br />

linear credibility premium. See also Bühlmann & Bühlmann (1999) and Luo, Young &<br />

Frees (2004). Frees, Young & Luo (1999) developed links between credibility theory and<br />

statistical models for longitudinal (or panel) data, as explained below.<br />

Frees & Wang (2006) used a longitudinal data set-up, so that the experience from several<br />

risk classes are observed over time, whereas Frees ET AL. (1999) focussed on credibility<br />

predictors that are linear combinations <strong>of</strong> the data and/or that are minimizers <strong>of</strong> a squarederror<br />

loss function. In contrast, Frees & Wang (2006) considered severity distributions that<br />

may be long-tailed so that averaging or using squared-error loss do not yield appropriate<br />

prediction tools.<br />

Antonio & Beirlant (2007) suggested the use <strong>of</strong> the generalized linear mixed models<br />

(where a transformation <strong>of</strong> the mean is expressed as a linear function <strong>of</strong> both fixed and<br />

random effects) in credibility theory. Many actuarial credibility models appear to be particular<br />

cases <strong>of</strong> generalized linear mixed models. Yeo & Valdez (2006) addressed a simultaneous<br />

dependence <strong>of</strong> claims across individuals for a fixed time period and across time periods<br />

for a fixed individual. This is accomplished by introducing the notion <strong>of</strong> a common effect<br />

affecting all individuals and another common effect affecting a fixed individual over time.<br />

This construction falls within the broader framework <strong>of</strong> generalized linear mixed models.<br />

Lo, Fung & Zhu (2006) considered a regression credibility model with random<br />

regression coefficients. The variance components represented by the uncertainty about the<br />

regression coefficients then account for the heterogeneity in risks borne by policyholders<br />

across contracts. From a different perspective, the dependence between contracts has been<br />

introduced by treating the contract-specific regression coefficients as being generated by the<br />

same random mechanism such that they are random deviations from the collective mean.<br />

Autoregressive specifications <strong>of</strong> the error structure in the credibility context have been<br />

proposed by Bolancé ET AL. (2003). In Sundt (1983), the generalized Bühlmann–Straub<br />

model was proposed with consecutive error terms assumed to follow AR(1) dependences.<br />

Qian (2000) used the nonparametric regression method to establish estimators for<br />

credibility premiums under some principles <strong>of</strong> premium calculation. The asymptotic<br />

properties <strong>of</strong> the estimators are studied in this paper.<br />

3.6.5 <strong>Credibility</strong> and Copulas<br />

Copulas are a powerful tool to model dependencies between multivariate outcomes. See,<br />

e.g., Denuit ET AL. (2005) for an introduction. Several works have successfully applied<br />

copulas to solve credibility problems. Let us mention Frees & Wang (2005) who handled<br />

serial (time) dependence through a t-copula, and Frees & Wang (2006) who extended that<br />

formulation by introducing elliptical copulas for serial dependencies. Like the t-copula, the<br />

elliptical copulas turn out to have an analytically tractable form for predictive distributions.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 161<br />

Multivariate credibility models may be considered for several lines <strong>of</strong> business, or several<br />

types <strong>of</strong> claims. Multivariate credibility models are discussed, e.g., in Frees (2003). This<br />

topic will be treated in Chapter 6.<br />

3.6.6 Time Dependent Random Effects<br />

The vast majority <strong>of</strong> the papers which have appeared in the actuarial literature considered<br />

time-independent heterogeneous models. This chapter is restricted to the case <strong>of</strong> static random<br />

effects: in the classical credibility construction A1–A2 <strong>of</strong> Definition 3.1, the risk parameter<br />

i relating to policyholder i is assumed to be constant over time. This is <strong>of</strong> course rather<br />

unrealistic since driving ability may vary during the driving career (because <strong>of</strong> the learning<br />

effect, or modification in the risk characteristics). In automobile insurance, an unknown<br />

underlying random parameter that develops over time expresses the fact that the abilities<br />

<strong>of</strong> a driver are not constant. Moreover, the hidden exogeneous variables revealed by claims<br />

experience may vary with time, as do observable ones.<br />

Another reason to allow for random effects that vary with time relates to moral<br />

hazard. Indeed, individual efforts to prevent accidents are unobserved and feature temporal<br />

dependence. The policyholders may adjust their efforts for loss prevention according to their<br />

experience with past claims, the amount <strong>of</strong> premium and awareness <strong>of</strong> future consequences<br />

<strong>of</strong> an accident (due to experience rating schemes). The effort variable determines the moral<br />

hazard and is modelled by a dynamic unobserved factor.<br />

Of course, it is hopeless in practice to discriminate between residual heterogeneity due to<br />

unobservable characteristics <strong>of</strong> drivers that significantly affect the risk <strong>of</strong> accident, and their<br />

individual efforts to prevent such accidents. Both effects get mixed in the latent process.<br />

Since the observed contagion between annual claim numbers is always positive, the effect<br />

<strong>of</strong> omitted explanatory variables seems to dominate moral hazard. Anyway, this issue has no<br />

practical implication since predictions depend on observed contagion, but not on its nature.<br />

Hence, instead <strong>of</strong> assuming that the risk characteristics are given once and for all by a<br />

single risk parameter, we might suppose that the unknown risk characteristics <strong>of</strong> each policy<br />

are described by dynamic random effects. In the terminology <strong>of</strong> Jewell (1975), evolutionary<br />

credibility models allow for random effects to vary in successive periods. Now, the ith<br />

policy <strong>of</strong> the portfolio, i = 1 2n, is represented by a double sequence i N i where<br />

i is a positive random vector with unit mean representing the unexplained heterogeneity.<br />

Specifically, the model is based on the following assumptions:<br />

B1<br />

Given i = i , the random variables N it , t = 1 2T i , are independent and conform<br />

to the Poisson distribution with mean it it , i.e.<br />

PrN it = k it = it = exp− it it it it k<br />

<br />

k!<br />

k= 0 1<br />

B2<br />

with it = d it exp T˜x it .<br />

At the portfolio level, the i s are assumed to be independent. Moreover, i1 iTi <br />

is distributed as 1 Ti<br />

where = 1 Tmax<br />

T is a stationary random<br />

vector (with T max = max T i ). It is further assumed that E it = 1 for all i, t. The unit<br />

mean condition is imposed for identification (otherwise, the mean could be absorbed


162 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

B3<br />

into the intercept term <strong>of</strong> the Poisson regression). This condition means that the a priori<br />

ratemaking is correct on average.<br />

The sequences i N i , i = 1 2n, are assumed to be independent.<br />

Purcaru & Denuit (2003) studied the kind <strong>of</strong> dependence arising in these credibility models<br />

for claim counts. Albrecht (1985) studied such credibility models for claim counts, whereas<br />

Gerber & Jones (1975) and Sundt (1981,1988) dealt with general random variables.<br />

A fundamental difference between static A1–A3 and dynamic B1–B3 credibility models<br />

is that the latter incorporate the age <strong>of</strong> the claims in the risk prediction, whereas the former<br />

neglect this information. Since we intuitively feel that the predictive ability <strong>of</strong> a claim should<br />

decrease with its age, dynamic specification seems more acceptable. As pointed out by<br />

Pinquet, Guillén & Bolancé (2001), dynamic credibility models agree with economic<br />

analysis <strong>of</strong> multiperiod optimal insurance under moral hazard. In this optic, the stationarity<br />

<strong>of</strong> the i s implies that the predictive ability <strong>of</strong> claims depends mainly on the lag between<br />

the date <strong>of</strong> prediction and the date <strong>of</strong> occurrence (because <strong>of</strong> time translation invariance <strong>of</strong><br />

the marginals <strong>of</strong> the i s).<br />

Empirical studies performed on panel data, as in Pinquet, Guillén & Bolancé (2001)<br />

and Bolancé, Guillén & Pinquet (2003), support time-varying (or dynamic) random<br />

effects. An interesting feature <strong>of</strong> credibility premium derived from stationary random effects<br />

with a decreasing correlogram is that the age <strong>of</strong> the claims are taken into account in the a<br />

posteriori correction: a recent claim will be more penalized than an old one (whereas the<br />

age <strong>of</strong> the claim is not taken into account with static random effects).<br />

This kind <strong>of</strong> a posteriori correction reconciles actuaries’ and economists’ approaches to<br />

experience rating. Henriet & Rochet (1986) distinguished two roles played by a posteriori<br />

corrections, showing that these two roles involve different rating structures. The first role<br />

deals with the problem <strong>of</strong> adverse selection, where the very aim is to evaluate as accurately as<br />

possible the true distribution <strong>of</strong> reported accidents. This is the classical actuarial perspective.<br />

The second role is linked to moral hazard and implies that the distribution <strong>of</strong> reported<br />

accidents over time must be taken into account to maintain incentives to drive carefully.<br />

This means that more weight must be given to recent information in order to maintain<br />

such incentives. This is the economic point <strong>of</strong> view. The credibility model B1–B3 with<br />

dynamic random effects, although theoretically more intricate, takes these two objectives into<br />

account.<br />

3.6.7 <strong>Credibility</strong> and Panel Data Models<br />

Frees, Young & Luo (1999) developed links between credibility theory and longitudinal<br />

(or panel) data models. They demonstrated how longitudinal data models can be applied<br />

to the credibility ratemaking problem. As pointed out by these authors, by expressing<br />

credibility ratemaking applications in the framework <strong>of</strong> longitudinal data models, actuaries<br />

can realize several benefits: (1) Longitudinal data models provide a wide variety <strong>of</strong><br />

models from which to choose. (2) Standard statistical s<strong>of</strong>tware makes analysing data<br />

relatively easy. (3) Actuaries have another method for explaining the ratemaking process.<br />

(4) Actuaries can use graphical and diagnostic tools to select a model and assess its<br />

usefulness.


<strong>Credibility</strong> Models for <strong>Claim</strong> <strong>Counts</strong> 163<br />

3.6.8 <strong>Credibility</strong> and Empirical Bayes Methods<br />

<strong>Credibility</strong> theory has an empirical Bayes flavour, as pointed out by Norberg (1980).<br />

Analyses commonly used in highway safety include the widely applied empirical Bayes<br />

method. According to Lord (2006), this method has become increasingly popular since it<br />

corrects for the regression-to-the-mean bias, refines the predicted mean <strong>of</strong> an entity, and is<br />

relatively simple to manipulate compared to the full Bayes approach. The empirical Bayes<br />

method combines information obtained from a reference group having similar characteristics<br />

with the information specific to the individual under study. A weight factor is assigned to<br />

both the reference population and the individual under study, and a credibility formula is<br />

obtained.<br />

<strong>Credibility</strong> theory has thus a clear empirical Bayes flavour. We refer the interested reader<br />

to Quigley, Bedford & Walls (2006) for a case study involving the rate <strong>of</strong> occurrence <strong>of</strong><br />

train derailments within the United Kingdom.


4<br />

Bonus-Malus Scales<br />

4.1 Introduction<br />

4.1.1 From <strong>Credibility</strong> to Bonus-Malus Scales<br />

One <strong>of</strong> the main tasks <strong>of</strong> the actuary is to design a tariff structure that will fairly distribute the<br />

burden <strong>of</strong> claims among policyholders. To this end, he <strong>of</strong>ten has to partition all policies into<br />

risk classes with all policyholders belonging to the same class paying the same premium. It<br />

is convenient to achieve a priori classification by resorting to generalized linear models (e.g.<br />

Poisson regression for claim counts), as explained in Chapter 2. However, many important<br />

factors cannot be taken into account at this stage. Consequently, risk classes are still quite<br />

heterogeneous despite the use <strong>of</strong> many a priori variables.<br />

Rating systems penalizing insureds responsible for one or more accidents by premium<br />

surcharges (or maluses), and rewarding claim-free policyholders by awarding them<br />

discounts (or bonuses) are now in force in many developed countries. Besides encouraging<br />

policyholders to drive carefully (i.e. counteracting moral hazard), they aim to better<br />

assess individual risks. The amount <strong>of</strong> premium is adjusted each year on the basis <strong>of</strong><br />

the individual claims experience using techniques from credibility theory, as shown in<br />

Chapter 3.<br />

However, credibility formulas are difficult to implement in practice, because <strong>of</strong> their<br />

mathematical complexity (complexity refers here to the sphere <strong>of</strong> commercial relations,<br />

where customers are <strong>of</strong>ten reluctant in using mechanisms that they consider to be complex,<br />

especially in connection with insurance products). For this reason, bonus-malus scales have<br />

been proposed by insurance companies. Such scales have to be seen as commercial versions<br />

<strong>of</strong> credibility formulas. The typical customer can figure out what the premium will be for<br />

any given claims history.<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


166 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.1.2 The Nature <strong>of</strong> Bonus-Malus Scales<br />

When a merit rating plan is in force, the amount <strong>of</strong> premium paid by the policyholder<br />

depends on the rating factors <strong>of</strong> the current period but also on claim history. In practice, a<br />

bonus-malus scale consists <strong>of</strong> a finite number <strong>of</strong> levels, each with its own relative premium.<br />

New policyholders have access to a specified level. After each year, the policy moves up<br />

or down according to transition rules and to the number <strong>of</strong> claims at fault. The premium<br />

charged to a policyholder is obtained by applying the relative premium associated to his<br />

current level in the scale to a base premium depending on his observable characteristics<br />

incorporated into the price list.<br />

4.1.3 Relativities<br />

The problem addressed in this chapter is the determination <strong>of</strong> the relative premiums attached<br />

to each <strong>of</strong> the levels <strong>of</strong> the scale when a priori classification is used by the company. The<br />

relativity associated to level l is denoted as r l . The meaning is that a policyholder occupying<br />

level l in the bonus-malus scale has to pay r l times the base premium to be covered by the<br />

insurance company.<br />

The severity <strong>of</strong> the a posteriori corrections must depend on the extent to which amounts<br />

<strong>of</strong> premiums vary according to observable characteristics <strong>of</strong> policyholders. The key idea<br />

is that both a priori classification and a posteriori corrections aim to create tariff cells<br />

as homogeneous as possible. The residual heterogeneity inside each <strong>of</strong> these cells being<br />

smaller for insurers incorporating more variables in their a priori ratemaking, the a posteriori<br />

corrections must be s<strong>of</strong>ter for those insurers.<br />

The framework <strong>of</strong> credibility theory, with its fundamental notion <strong>of</strong> randomly distributed<br />

risk parameters, was employed in analysis <strong>of</strong> bonus-malus systems by Pesonen as early as<br />

1963. In this chapter, we will keep the framework <strong>of</strong> Definition 3.1. According to Norberg<br />

(1976), once the number <strong>of</strong> classes, the starting level and the transition rules have been fixed,<br />

the optimal relativity associated with level l is determined by maximizing the asymptotic<br />

predictive accuracy. Formally, the relativities minimize the mean squared deviation between<br />

a policy’s expected claim frequency and its premium in the year t as t →+. The optimal<br />

relativity for level l is thus equal to the conditional expected risk parameter for an infinitely<br />

old policy, given that the policy is in level l.<br />

4.1.4 Bonus-Malus Scales and Markov Chains<br />

In most <strong>of</strong> the commercial bonus-malus systems, the knowledge <strong>of</strong> the current level and the<br />

number <strong>of</strong> claims during the current period suffice to determine the next level in the scale.<br />

So the future (the level for year t + 1) depends only on the present (the level for year t and<br />

the number <strong>of</strong> accidents reported during year t) and not on the past. This is closely related to<br />

the memoryless property <strong>of</strong> the Markov chains. If the claim numbers in different years are<br />

(conditionally) independent then the trajectory <strong>of</strong> a given policyholder in the bonus-malus<br />

scale will be a (conditional) Markov chain. Sometimes, fictitious levels have to be introduced<br />

to recover the memoryless property.<br />

The treatment <strong>of</strong> bonus-malus scales is best performed in the framework <strong>of</strong> Markov<br />

chains. This chapter is nevertheless self-contained, and does not require any prior knowledge


Bonus-Malus Scales 167<br />

<strong>of</strong> this topic. All the useful results have been derived in an elementary way (the readers<br />

having acquaintance with the theory <strong>of</strong> Markov chains will rapidly recognize all the classical<br />

machinery taught in textbooks devoted to stochastic processes).<br />

4.1.5 Financial Equilibrium<br />

Exactly as for credibility mechanisms, it is important that the relativities average to 100 %,<br />

resulting in financial equilibrium. This fundamental property is highly desirable: it guarantees<br />

that the introduction <strong>of</strong> a bonus-malus system has no impact on the yearly premium collection.<br />

The distribution <strong>of</strong> the amounts paid by the policyholders is modified according to the<br />

reported claims but on the whole, the company gets the same amount <strong>of</strong> money.<br />

Throughout this chapter, we work with the long-run equilibrium distribution <strong>of</strong><br />

policyholders in the bonus-malus levels. We will see that in the long run, the way the<br />

relativities are computed in this chapter ensures that the bonus-malus system is financially<br />

stable. Things are however more complicated in practice. Specifically, some undesirable<br />

phenomena can arise in a transient regime. These issues will be addressed in Chapters 8–9.<br />

4.1.6 Agenda<br />

In Section 4.2, the trajectory <strong>of</strong> the policyholder accross the bonus-malus levels is modelled<br />

as a Markov chain. Section 4.3 is devoted to transition probabilities, that is, the probability<br />

that the policyholder moves from one level to another over a given time horizon. The longterm<br />

behaviour <strong>of</strong> the scale is studied in Section 4.4. It is shown there that the proportions<br />

<strong>of</strong> policyholders in each level <strong>of</strong> the scale tend to stabilize over time. Various methods to<br />

compute the stationary probabilities are described.<br />

Section 4.5 explains how to compute the relativities using a quadratic loss function. As<br />

for credibility formulas, relativities that are linear in the bonus-malus level are also derived.<br />

Section 4.5.3 examines the interaction between the bonus-malus scale and a priori risk<br />

classification. It is shown there that creating several scales decreases the rating inadequacies.<br />

In Section 4.6, the quadratic loss function is replaced with an exponential one. A<br />

comparison with quadratic relativities is performed, and the influence <strong>of</strong> the severity<br />

parameter is carefully assessed.<br />

In Section 4.7, we will consider the so-called special bonus rule. According to this rule,<br />

a policyholder who did not report any claim for a certain number <strong>of</strong> years, and is still in the<br />

malus zone (i.e. in a level with a relativity above 100 %) is automatically sent to the initial<br />

level (i.e. to the level with relativity equal to 100 %). Many compulsory systems formerly<br />

imposed by governments included such a rule. Because <strong>of</strong> this special rule, the stochastic<br />

process describing the trajectory <strong>of</strong> the drivers accross the levels is no longer Markovian.<br />

The memoryless property can nevertheless be re-obtained by adding fictitious levels in the<br />

scale. To fix the ideas, we will study the special bonus rule in the former compulsory Belgian<br />

system.<br />

In a competitive market, it can be expected that some policyholders switch from one<br />

insurance company to the other. In a regulated framework, with a unique compulsory bonusmalus<br />

system imposed on all the insurance companies, the drivers will be subject to the<br />

same a posteriori corrections whatever the insurer. If a driver decides to switch to another


168 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

insurer, he must first obtain a certificate from the former insurer stating his attained bonusmalus<br />

level and whether pending claims could affect this level. The new insurer must then<br />

award the same discount or apply the same surcharges. The competition between insurance<br />

companies is limited to the services <strong>of</strong>fered and the a priori premiums.<br />

Things become more complicated in a deregulated market, where each insurer is free to<br />

design its own bonus-malus system. Then, insurers compete also on the basis <strong>of</strong> a posteriori<br />

corrections. It rapidly becomes extremely difficult for policyholders to determine the optimal<br />

insurance provider, since companies apply different penalties when claims are reported. In<br />

Section 4.8, we consider a policyholder switching from insurer A to insurer B. He occupies<br />

the level l 1 in the bonus-malus scale used by company A, and the question is where to place<br />

him in the bonus-malus scale used by company B.<br />

Section 4.9 examines the dependence properties existing between the successive levels<br />

occupied by the policyholders and the random risk parameter. It is argued that contrarily to<br />

the results obtained with credibility models, the risk parameters do not necessarily increase<br />

with the level occupied in the scale.<br />

The final Section 4.10 gives references and addresses further issues.<br />

4.2 <strong>Modelling</strong> Bonus-Malus Systems<br />

4.2.1 Typical Bonus-Malus Scales<br />

Before embarking on an abstract definition <strong>of</strong> bonus-malus systems, let us discuss a couple<br />

<strong>of</strong> examples that will be used throughout this chapter.<br />

Example 4.1 (−1/Top Scale) This bonus-malus scale has 6 levels (numbered 0 to 5).<br />

Policyholders are classified according to the number <strong>of</strong> claim-free years since their last claim<br />

(0, 1, 2, 3, 4 or at least 5). After a claim all premium reductions are lost. The transition rules<br />

are described in Table 4.1. Specifically, the starting class is the highest level 5. Each claimfree<br />

year is rewarded by one bonus class. When an accident is reported, all the discounts are<br />

lost and the policyholder is transferred to level 5.<br />

Note that the philosophy behind such a bonus-malus system is different from credibility<br />

theory. Indeed, this bonus-malus scale only aims to counteract moral hazard: it is in fact<br />

more or less equivalent to a deductible which is not paid at once but smoothed over the time<br />

Table 4.1 Transition rules for<br />

the scale −1/top.<br />

Starting Level occupied if<br />

level 0 ≥ 1<br />

claim is reported<br />

0 0 5<br />

1 0 5<br />

2 1 5<br />

3 2 5<br />

4 3 5<br />

5 4 5


Bonus-Malus Scales 169<br />

needed to go back to the lowest class. Note however that this ‘smoothed’ deductible only<br />

applies to the first claim: subsequent claims are ‘for free’.<br />

Example 4.2 (−1/+2 Scale) There are 6 levels. Level 5 is the starting level. A higher<br />

level number indicates a higher premium. The discount per claim-free year is one level: if no<br />

claims have been reported by the policyholder then he moves one level down. The penalty<br />

per claim is two levels. If a number <strong>of</strong> claims, n t > 0, has been reported during year t then<br />

the policyholder moves 2n t levels up. The transition rules are described in Table 4.2.<br />

In the subsequent sections, we will also make the −1/+2 scale more severe, by penalizing<br />

each claim by 3 levels instead <strong>of</strong> 2. This alternative bonus-malus system will be henceforth<br />

referred to as the −1/ + 3 system.<br />

4.2.2 Characteristics <strong>of</strong> Bonus-Malus Scales<br />

The bonus-malus scales investigated in this book are assumed to possess s + 1 levels,<br />

numbered from 0 to s. A specified level is assigned to a new driver. In practice, the initial<br />

level may depend upon the use <strong>of</strong> the vehicle (or upon another observable risk characteristic).<br />

Each claim-free year is rewarded by a bonus point (i.e. the driver goes one level down).<br />

<strong>Claim</strong>s are penalized by malus points (i.e. the driver goes up a certain number <strong>of</strong> levels<br />

each time he files a claim). We assume that the penalty is a given number <strong>of</strong> classes per<br />

claim. This facilitates the mathematical treatment <strong>of</strong> the problem. More general systems can<br />

nevertheless be considered, with higher penalties for subsequent claims. After sufficiently<br />

many claim-free years, the driver enters level 0 where he enjoys the maximal bonus.<br />

In Chapter 3, updating premiums with credibility formulas only uses the total number<br />

<strong>of</strong> claims reported by the policyholder in the past. The new premium does not depend on<br />

the way the accidents are distributed over the years. This property is never satisfied by<br />

bonus-malus systems, where it would be the policyholder’s interest to concentrate all the<br />

claims during a single year.<br />

In commercial bonus-malus systems, the knowledge <strong>of</strong> the present level and <strong>of</strong> the<br />

number <strong>of</strong> claims <strong>of</strong> the present year suffices to determine the next level. Together with<br />

the (conditional) independence <strong>of</strong> annual claim numbers, this ensures that the trajectory<br />

accross the bonus-malus levels may be represented by a (conditional) Markov chain: the<br />

Table 4.2<br />

Transition rules for the scale −1/+2.<br />

Starting<br />

Level occupied if<br />

level 0 1 2 ≥3<br />

claim(s) is/are reported<br />

5 4 5 5 5<br />

4 3 5 5 5<br />

3 2 5 5 5<br />

2 1 4 5 5<br />

1 0 3 5 5<br />

0 0 2 4 5


170 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

future (the level for year t + 1) depends on the present (the level for year t and the number<br />

<strong>of</strong> accidents reported during year t) and not on the past (the complete claim history and<br />

the levels occupied during years 1 2t− 1). Sometimes, fictitious levels have to be<br />

introduced in order to meet this memoryless property. Indeed, in some bonus-malus systems,<br />

policyholders occupying high levels are sent to the starting class after a few years without<br />

claims. This issue will be addressed in Section 4.7.<br />

4.2.3 Trajectory<br />

New drivers start in level l 0 <strong>of</strong> the scale. Note that experienced drivers arriving in the<br />

portfolio are not necessarily placed in level l 0 , but in a level corresponding to their claim<br />

history or to the level occupied in the bonus-malus scale used by a competitor. This problem<br />

will be dealt with in Section 4.8.<br />

The trajectory <strong>of</strong> the policyholder in the bonus-malus scale is modelled by a sequence<br />

L 1 L 2 <strong>of</strong> random variables valued in 0 1s, such that L k is the level occupied<br />

during the k + 1th year, i.e. during the time interval k k + 1. Since movements in the<br />

scale occur once a year (at policy anniversary), the policyholder occupies level L k from<br />

time k until time k + 1. Once the number N k <strong>of</strong> claims reported by the policyholder during<br />

k − 1k is known, this information is used to reevaluate the position <strong>of</strong> the driver in the<br />

scale. We supplement the sequence <strong>of</strong> the L k s with L 0 = l 0 .<br />

The L k s obviously depend on the past numbers <strong>of</strong> claims N 1 N 2 N k reported by<br />

the policyholder. If we denote as ‘pen’ the penalty induced by each claim (expressed as a<br />

number <strong>of</strong> levels), then the L k s obey the recursion<br />

{ maxLk−1 − 1 0 if N<br />

L k =<br />

k = 0<br />

minL k−1 + N k × pens if N k ≥ 1<br />

{<br />

= max min { L k−1 − 1 − I k + N k × pens } }<br />

0<br />

where<br />

{ 1ifNk ≥ 1<br />

I k =<br />

0 otherwise<br />

indicates whether at least one claim has been reported in year k. This is an example <strong>of</strong> a<br />

stochastic recursive equation. This representation <strong>of</strong> the L k s clearly shows that the future<br />

trajectory <strong>of</strong> the policyholder in the scale is independent <strong>of</strong> the levels occupied in the past,<br />

provided that the present level is given. This conditional independence property is at the<br />

heart <strong>of</strong> Markov models.<br />

The stochastic recursive equations given above assume that the bonus is lost in case at<br />

least one claim is filed with the company. In some cases (like the former compulsory Belgian<br />

bonus-malus scale), the bonus is granted in any case. The L k s then obey the recursion<br />

{<br />

L k = max min { L k−1 − 1 + N k × pens } }<br />

0 <br />

This means that the first claim is penalized by pen−1 levels, and the subsequent ones by<br />

pen levels.


Bonus-Malus Scales 171<br />

4.2.4 Transition Rules<br />

The probability <strong>of</strong> moving from one level to another depends on the number <strong>of</strong> claims<br />

reported during the current year. Therefore, we can introduce more formally the transition<br />

rules which impose the transfer from one level to another level once the number <strong>of</strong> claims<br />

is known. If k claims are reported,<br />

{ 1 if the policy gets transferred from level i to level j,<br />

t ij k =<br />

0 otherwise.<br />

The t ij ks are put in matrix form Tk, i.e.<br />

⎛<br />

t 00 k t 01 k ··· t 0s k<br />

t 10 k t 11 k ··· t 1s k<br />

Tk = ⎜<br />

⎝<br />

<br />

<br />

t s0 k t s1 k ··· t ss k<br />

⎞<br />

⎟<br />

⎠ <br />

Then, Tk is a 0-1 matrix having in each row exactly one 1.<br />

Example 4.3 (−1/Top Scale)<br />

⎛<br />

T0 =<br />

⎜<br />

⎝<br />

1 0 0 0 0 0<br />

1 0 0 0 0 0<br />

0 1 0 0 0 0<br />

0 0 1 0 0 0<br />

0 0 0 1 0 0<br />

0 0 0 0 1 0<br />

and Tk = T1 for all k ≥ 2.<br />

In this case, we have<br />

⎞ ⎛<br />

T1 =<br />

⎟ ⎜<br />

⎠ ⎝<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

⎞<br />

⎟<br />

⎠<br />

Example 4.4 (−1/+2 Scale)<br />

⎛<br />

T0 =<br />

⎜<br />

⎝<br />

1 0 0 0 0 0<br />

1 0 0 0 0 0<br />

0 1 0 0 0 0<br />

0 0 1 0 0 0<br />

0 0 0 1 0 0<br />

0 0 0 0 1 0<br />

In this case, we have<br />

⎞ ⎛<br />

T1 =<br />

⎟ ⎜<br />

⎠ ⎝<br />

0 0 1 0 0 0<br />

0 0 0 1 0 0<br />

0 0 0 0 1 0<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

⎞<br />

⎟<br />

⎠<br />

⎛<br />

T2 =<br />

⎜<br />

⎝<br />

0 0 0 0 1 0<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

⎞<br />

⎛<br />

and Tk =<br />

⎟<br />

⎜<br />

⎠<br />

⎝<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

0 0 0 0 0 1<br />

⎞<br />

for all k ≥ 3<br />

⎟<br />


172 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.3 Transition Probabilities<br />

4.3.1 Definition<br />

Let us now assume that N 1 N 2 are independent and oi distributed. The trajectory<br />

will be denoted as L 1 L 2 to emphasize the dependence upon the annual<br />

expected claim frequency . Note however that the argument in L k does not mean<br />

that the L k s are functions <strong>of</strong> the parameter , but only that their distribution depends<br />

on .<br />

Let p l1 l 2<br />

be the probability <strong>of</strong> moving from level l 1 to level l 2 for a policyholder with<br />

annual mean claim frequency , that is,<br />

p l1 l 2<br />

= PrL k+1 = l 2 L k = l 1 <br />

with l 1 l 2 ∈ 0 1s. Clearly, the p l1 l 2<br />

s satisfy<br />

p l1 l 2<br />

≥ 0 for all l 1 and l 2 , and<br />

s∑<br />

p l1 l 2<br />

= 1 (4.1)<br />

l 2 =0<br />

Moreover, the transition probabilities can be expressed using the t ij ·s introduced above.<br />

To see this, it suffices to write<br />

p l1 l 2<br />

=<br />

=<br />

+∑<br />

n=0<br />

∑<br />

n=0<br />

PrL k+1 = l 2 N k+1 = n L k = l 1 PrN k+1 = nL k = l 1 <br />

n<br />

n! exp −t l 1 l 2<br />

n<br />

Note that we have used the fact that N k+1 and L k are independent (since L k depends<br />

on N 1 N k ), so that<br />

PrN k+1 = nL k = l 1 = PrN k+1 = n = n<br />

exp −<br />

n!<br />

The transition probabilities allow the actuary to compute the probability <strong>of</strong> any trajectory<br />

in the scale. Specifically, since the probability that a certain policyholder with expected<br />

annual claim frequency is in level l 1 l n at time 1n is simply the probability<br />

<strong>of</strong> going from l 0 to l n via the intermediate levels l 1 l n−1 , we have<br />

PrL 1 = l 1 L n = l n L 0 = l 0 = p l0 l 1<br />

···p ln−1 l n<br />

(4.2)<br />

Furthermore, it is enough to know the current position in the scale to determine the probability<br />

<strong>of</strong> being transferred to any other level in the bonus-malus scale. Formally,<br />

PrL n = l n L n−1 = l n−1 L 0 = l 0 = p ln−1 l n<br />

<br />

whenever PrL n−1 = l n−1 L 0 = l 0 >0.


Bonus-Malus Scales 173<br />

4.3.2 Transition Matrix<br />

Further, P is the one-step transition matrix, i.e.<br />

⎛<br />

p 00 p 01 ··· p 0s <br />

p 10 p 11 ··· p 1s <br />

P = ⎜<br />

⎝<br />

<br />

<br />

p s0 p s1 ··· p ss <br />

⎞<br />

⎟<br />

⎠ <br />

From (4.1), we see that the matrix P is a stochastic matrix. As already mentioned, the<br />

future level <strong>of</strong> a policyholder is independent <strong>of</strong> its past levels and only depends on its present<br />

level (and also on the number <strong>of</strong> claims reported during the present year).<br />

In matrix form, we can write P as<br />

P =<br />

∑<br />

k=0<br />

k<br />

exp −Tk<br />

k!<br />

provided the N t s are independent and oi distributed.<br />

Example 4.5 (−1/Top Scale)<br />

system is given by<br />

⎛<br />

P =<br />

⎜<br />

⎝<br />

The transition matrix P associated with this bonus-malus<br />

exp− 0 0 0 0 1− exp−<br />

exp− 0 0 0 0 1− exp−<br />

0 exp− 0 0 0 1− exp−<br />

0 0 exp− 0 0 1− exp−<br />

0 0 0 exp− 0 1− exp−<br />

0 0 0 0 exp− 1 − exp−<br />

Example 4.6 (−1/+2 Scale)<br />

system is given by<br />

⎞<br />

<br />

⎟<br />

⎠<br />

The transition matrix P associated with this bonus-malus<br />

⎛<br />

<br />

exp− 0 exp− 0<br />

2<br />

exp− 1 − ⎞<br />

2 1<br />

⎜ exp− 0 0 exp− 0 1− 2<br />

⎜⎜⎜⎜⎝ 0 exp− 0 0 exp− 1 − P =<br />

3<br />

0 0 exp− 0 0 1− exp−<br />

<br />

⎟<br />

0 0 0 exp− 0 1− exp− ⎠<br />

0 0 0 0 exp− 1 − exp−<br />

where i represents the sum <strong>of</strong> the elements in columns 1 to 5 in row i, i = 1 2 3, that is,<br />

)<br />

1 = exp−<br />

(1 + + 2<br />

2<br />

2 = 3 = exp− 1 +


174 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.3.3 Multi-Step Transition Probabilities<br />

The probability<br />

p n<br />

ij = PrL k+n = jL k = i<br />

evaluates the likelihood <strong>of</strong> being transferred from level i to level j in n steps. Note that this<br />

is the probability that L k+n = j given L k = i for any k. The process describing the<br />

trajectory <strong>of</strong> the policyholder accross the levels is thus stationary. From<br />

p n<br />

ij =<br />

s∑<br />

i 1 =0 i 2 =0<br />

s∑<br />

<br />

s∑<br />

i n−1 =0<br />

p ii1 p i1 i 2<br />

···p in−1 j<br />

we clearly see that it includes all the possible paths from i to j and the probability <strong>of</strong> their<br />

occurrence. This is the n-step transition probability p n<br />

ij . Therefore, the matrix<br />

⎛<br />

P n =<br />

⎜<br />

⎝<br />

p n<br />

00<br />

p n<br />

10<br />

pn 01<br />

pn 11<br />

<br />

p n<br />

s0 <br />

··· pn 0s <br />

⎞<br />

··· pn 1s <br />

<br />

<br />

⎟<br />

⎠<br />

pn s1 ··· pn<br />

ss <br />

is called the n-step transition matrix corresponding to P.<br />

The following result shows that P n is a stochastic matrix, being the nth power <strong>of</strong> the<br />

one-step transition matrix P.<br />

Property 4.1<br />

For all n m = 0 1,<br />

P n = P n (4.3)<br />

and hence,<br />

P n+m = P n P m (4.4)<br />

Pro<strong>of</strong> The pro<strong>of</strong> is by induction on n. The result is obviously true for n = 1. Assume it<br />

holds for n and let us show that it is still true for n+1. Clearly, by conditioning on the level<br />

l occupied at time n we get<br />

p n+1<br />

ij =<br />

s∑<br />

l=0<br />

p n<br />

il p lj (4.5)<br />

which corresponds to matrix multiplication. This proves (4.3), from which (4.4) readily<br />

follows.<br />

□<br />

The matrix identity (4.4) is usually called the Chapman Kolmogorov equation. Taking<br />

the nth power <strong>of</strong> P yields the n-step transition matrix whose element l 1 l 2 , denoted


Bonus-Malus Scales 175<br />

as p n<br />

l 1 l 2<br />

, is the probability <strong>of</strong> moving from level l 1 to level l 2 in n transitions. Using<br />

Property 4.1, we get the following representation for the distribution <strong>of</strong> the state variable L n :<br />

denoting as<br />

we have<br />

p k = ( PrL k = 0PrL k = s ) T<br />

<br />

p k+n = p k P n (4.6)<br />

Remark 4.1 (Numerical Aspects) The computation <strong>of</strong> the probability distribution <strong>of</strong> L n <br />

amounts to calculating the nth power <strong>of</strong> the transition matrix P. For large values <strong>of</strong> n, this<br />

may pose some computational difficulties. This is why we now discuss an algebraic method<br />

which makes use <strong>of</strong> the concept <strong>of</strong> eigenvalues and eigenvectors (that will be encountered<br />

further in this chapter).<br />

The vector v with at least one component different from 0 is a right eigenvector <strong>of</strong> P if<br />

Pv = v for some ∈ . In this case, is said to be an eigenvalue <strong>of</strong> P. Finding the<br />

eigenvalues <strong>of</strong> P amounts to solving the characteristic equation detP − I = 0. A<br />

nonzero vector u that is a solution <strong>of</strong> u T P = u T is called a left eigenvector corresponding<br />

to .<br />

In general, the characteristic equation possesses s + 1 solutions 0 s which can be<br />

complex and some <strong>of</strong> them can coincide (we assume that the eigenvalues are numbered so<br />

that 0 ≥ 1 ≥···≥ s ). The Perron-Froebenius theorem for regular matrices ensures<br />

that<br />

∑<br />

provided the transition matrix P is regular then 0 = 1, v 0 = e, u 0 ≥ 0 with u T e =<br />

s<br />

j=0 u 0j = 1. Moreover, all the other eigenvalues <strong>of</strong> P lie inside the unit circle <strong>of</strong> the<br />

complex plane, that is j < 1 for j = 1s.<br />

Let V = v 0 v s be an s +1×s +1 matrix consisting <strong>of</strong> right column eigenvectors<br />

and<br />

⎛<br />

⎜<br />

U = ⎝<br />

an s + 1 × s + 1 matrix consisting <strong>of</strong> left column eigenvectors. Let us assume that the<br />

eigenvalues 0 s are distinct. This ensures that v 0 v s are linearly independent so<br />

that V is invertible. Moreover, U = V −1 . In this case, P can be represented as<br />

⎛<br />

⎞<br />

0 ··· 0<br />

⎜<br />

P = V<br />

⎝<br />

<br />

<br />

⎟<br />

s∑<br />

⎠ U = j v j u T j <br />

j=0<br />

0 ··· s<br />

u T 0<br />

<br />

u T s+1<br />

This representation is useful for computing the nth power <strong>of</strong> P in that<br />

⎛<br />

⎞<br />

n 0<br />

··· 0<br />

P n ⎜<br />

= V ⎝<br />

<br />

<br />

⎟<br />

s∑<br />

⎠ U = n j v ju T j <br />

0 ··· n j=0<br />

s<br />

⎞<br />

⎟<br />


176 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.3.4 Ergodicity and Regular Transition Matrix<br />

A Markov chain with transition matrix P is said to be ergodic if P is regular, that is, if there<br />

exists some n 0 ≥ 1 such that all entries <strong>of</strong> P n 0 are strictly positive. This condition means<br />

that it is possible, with a strictly positive probability, to go from one level i to another level<br />

j in a finite number <strong>of</strong> transitions or, in other words, that all states <strong>of</strong> the Markov chain are<br />

accessible from any initial state in a finite number <strong>of</strong> steps.<br />

All bonus-malus scales in practical use have a ‘best’ level, with the property that a policy<br />

in that level remains in the same level after a claim-free period. In our framework, the best<br />

level is level 0 and, for any level, it is possible to reach the superbonus level 0 after a<br />

sufficiently large number <strong>of</strong> claim-free years, resulting in p n<br />

l0<br />

> 0 for all sufficiently<br />

large n. In the following, we restrict our attention to such non-periodic bonus rules. The<br />

transition matrix P associated with such a bonus-malus scale is regular, i.e. there exists<br />

some integer n 0 ≥ 1 such that all entries <strong>of</strong> the n 0 th power P n 0 <strong>of</strong> the one-step transition<br />

matrix are strictly positive.<br />

4.4 Long-Term Behaviour <strong>of</strong> Bonus-Malus Systems<br />

4.4.1 Stationary Distribution<br />

A natural question that arises concerns the long term behaviour <strong>of</strong> a bonus-malus system.<br />

Intuitively, we expect that the system will stabilize in the long run. Since the annual claim<br />

numbers have been assumed to be independent and identically distributed, each policyholder<br />

will ultimately stabilize around an equilibrium level correponding to the expected annual<br />

claim frequency , and will gravitate around this level.<br />

To formalize this intuitive idea, let us compute the powers <strong>of</strong> the transition matrix P<br />

for = 01 inthe−1/top and −1/ + 2 bonus-malus scales. This is done in the following<br />

examples.<br />

Example 4.7 (−1/Top Scale)<br />

⎛<br />

P01 =<br />

⎜<br />

⎝<br />

we get<br />

⎛<br />

P 2 01 =<br />

⎜<br />

⎝<br />

Starting from<br />

0904837 0 0 0 0 0095163<br />

0904837 0 0 0 0 0095163<br />

0 0904837 0 0 0 0095163<br />

0 0 0904837 0 0 0095163<br />

0 0 0 0904837 0 0095163<br />

0 0 0 0 0904837 0095163<br />

0818731 0 0 0 0086107 0095163<br />

0818731 0 0 0 0086107 0095163<br />

0818731 0 0 0 0086107 0095163<br />

0 0818731 0 0 0086107 0095163<br />

0 0 0818731 0 0086107 0095163<br />

0 0 0 0818731 0086107 0095163<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

<br />

⎟<br />


Bonus-Malus Scales 177<br />

⎛<br />

P 3 01 =<br />

⎜<br />

⎝<br />

0740818 0 0 0077913 0086107 0095163<br />

0740818 0 0 0077913 0086107 0095163<br />

0740818 0 0 0077913 0086107 0095163<br />

0740818 0 0 0077913 0086107 0095163<br />

0 0740818 0 0077913 0086107 0095163<br />

0 0 0740818 0077913 0086107 0095163<br />

⎞<br />

<br />

⎟<br />

⎠<br />

and<br />

⎛<br />

P 4 01 =<br />

⎜<br />

⎝<br />

⎛<br />

P 5 01 =<br />

⎜<br />

⎝<br />

067032 000000 0070498 0077913 0086107 0095163<br />

067032 000000 0070498 0077913 0086107 0095163<br />

067032 000000 0070498 0077913 0086107 0095163<br />

067032 000000 0070498 0077913 0086107 0095163<br />

067032 000000 0070498 0077913 0086107 0095163<br />

000000 067032 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

0606531 0063789 0070498 0077913 0086107 0095163<br />

where all the rows are identical. Of course,<br />

⎞<br />

<br />

⎟<br />

⎠<br />

⎞<br />

<br />

⎟<br />

⎠<br />

P k 01 = P 5 01 for any integer k ≥ 6<br />

This means that, whatever the initial distribution,<br />

p k 01 = 0606531 0063789 0070498 0077913 0086107 0095163 T<br />

for any k ≥ 5. The proportion <strong>of</strong> policyholders occupying each <strong>of</strong> the levels <strong>of</strong> the −1/top<br />

scale thus remains unchanged after 5 years.<br />

Example 4.8 (−1/+2 Scale)<br />

⎛<br />

P01 =<br />

⎜<br />

⎝<br />

In this case,<br />

0904837 0 0090484 0 0004524 0000155<br />

0904837 0 0 0090484 0 0004679<br />

0 0904837 0 0 0090484 0004679<br />

0 0 0904837 0 0 0095163<br />

0 0 0 0904837 0 0095163<br />

0 0 0 0 0904837 0095163<br />

⎞<br />

<br />

⎟<br />


178 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The convergence is now much slower:<br />

⎛<br />

P 5 01 =<br />

⎜<br />

⎝<br />

0791523 0081985 0088694 0018169 0015456 0004173<br />

0788490 0066822 0106890 0017259 0014546 0005992<br />

0788490 0063789 0073531 0053651 0013637 0006902<br />

0606531 0245749 0070498 0020292 0050028 0006902<br />

0606531 0063789 0252457 0017259 0034865 0025098<br />

0606531 0063789 0070498 0199219 0031833 0028131<br />

⎞<br />

⎟<br />

⎠<br />

⎛<br />

P 10 01 =<br />

⎜<br />

⎝<br />

0784013 0081747 0090966 0022022 0016217 0005037<br />

0784003 0081480 0090248 0023009 0016178 0005081<br />

0777382 0088092 0089871 0022071 0017497 0005087<br />

0776278 0079263 0099795 0021694 0016890 0006080<br />

0776278 0078160 0090966 0031618 0016623 0006356<br />

0743169 0111269 0089862 0026100 0023236 0006365<br />

⎞<br />

⎟<br />

⎠<br />

⎛<br />

P 20 01 =<br />

⎜<br />

⎝<br />

which slowly converges to<br />

⎛<br />

01 =<br />

⎜<br />

⎝<br />

0782907 0082338 0090996 0022276 0016387 0005096<br />

0782903 0082332 0091006 0022275 0016387 0005097<br />

0782902 0082326 0090993 0022295 0016386 0005098<br />

0782803 0082424 0090984 0022285 0016406 0005098<br />

0782776 0082352 0091082 0022278 0016403 0005108<br />

0782774 0082327 0091011 0022376 0016399 0005113<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

0782901 0082338 0090998 0022278 0016387 0005097<br />

In this case, the system is not stable after 20 years.<br />

Let us consider the trajectory <strong>of</strong> a policyholder with expected claim frequency <br />

accross the levels <strong>of</strong> the bonus-malus scale. We define the stationary distribution =<br />

0 1 s T as follows: l is the stationary probability for a policyholder<br />

with mean frequency to be in level l i.e.<br />

l2<br />

= lim<br />

n→+ pn l 1 l 2<br />

<br />

The term l is the limit value <strong>of</strong> the probability that the policyholder is in level l, when<br />

the number <strong>of</strong> periods tends to +. It is also the fraction <strong>of</strong> the time a policyholder with<br />

claim frequency spends in level l, once the steady state has been reached. Note that <br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

<br />

⎟<br />


Bonus-Malus Scales 179<br />

does not depend on the starting class. This means that the nth power P n <strong>of</strong> the one-step<br />

transition matrix P converges to a matrix with all the same rows T , that is<br />

⎛<br />

lim<br />

n→+ Pn = = ⎜<br />

⎝<br />

T <br />

T <br />

<br />

T <br />

exactly as we saw in the introductory examples.<br />

Let us now explain how to compute the l s. Taking the limit in both sides <strong>of</strong> (4.5)<br />

for n →+, we see that the vector is the unique probabilistic solution to the system<br />

<strong>of</strong> linear equations<br />

j =<br />

⎞<br />

⎟<br />

⎠ <br />

s∑<br />

l p lj j ∈ 0s (4.7)<br />

l=0<br />

In matrix notation, (4.7) can be written as<br />

{ T = T P<br />

T e = 1<br />

(4.8)<br />

where e is a column vector <strong>of</strong> 1s. This means that is the left eigenvector u 0 <strong>of</strong> P<br />

encountered above. Thus we see that if the initial distribution in the scale is , then the<br />

probability distribution remains equal to .<br />

4.4.2 Rolski–Schmidli–Schmidt–Teugels Formula<br />

Let E be the s +1×s +1 matrix all <strong>of</strong> whose entries are 1, i.e. consisting <strong>of</strong> s +1 column<br />

vectors e. Then, the following result provides a direct method to get .<br />

Property 4.2 Assume that the stochastic matrix P is regular. Then the matrix I −P+<br />

E is invertible and the solution <strong>of</strong> (4.8) is given by<br />

T = e T I − P + E −1 (4.9)<br />

Pro<strong>of</strong><br />

Let us first check that I − P + E is invertible. We must show that<br />

I − P + Ex = 0 ⇒ x = 0<br />

From (4.8), we have T I − P = 0. Thus,<br />

I − P + Ex = 0 ⇒ 0 = T I − P + Ex = 0 + T Ex<br />

⇔ T Ex = 0


180 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

On the other hand, T E = e T . Thus, e T x = 0, which implies Ex = 0. Consequently,<br />

This implies for any n ≥ 1 that<br />

I − Px = 0 ⇔ Px = x<br />

x = P n x → x<br />

i.e. x i = ∑ s<br />

j=0 jx j for all i = 0s. Because the right-hand side <strong>of</strong> these equations<br />

does not depend on i, we have x = ce for some c ∈ . Since we also have<br />

0 = e T x = ce T e = cs + 1 ⇒ c = 0<br />

Thus, I − P + E is invertible. Furthermore, since T I − P = 0, we have<br />

T I − P + E = T E = e T <br />

This proves (4.9).<br />

□<br />

If the number s + 1 <strong>of</strong> states is small, the matrix I − P + E can easily be inverted. For<br />

larger s + 1, numerical methods have to be used, like the Gaussian elimination algorithm.<br />

Example 4.9 (−1/top scale) We have seen above that p 5<br />

5l = l for l = 0 15,<br />

so that the system needs 5 years to reach stationarity (i.e. the time needed by the best<br />

policyholders starting from level 5 to arrive in level 0). Formula (4.9) gives here<br />

T = 1 1 1 1 1 1<br />

⎛<br />

×<br />

⎜<br />

⎝<br />

2 − exp− 1 1 1 1 exp−<br />

1 − exp− 2 1 1 1 exp−<br />

1 1− exp− 2 1 1 exp−<br />

1 1 1− exp− 2 1 exp−<br />

1 1 1 1− exp− 2 exp−<br />

1 1 1 1 1− exp− 1 + exp−<br />

Applying this formula in the particular case = 01, we get<br />

T 01 = 1 1 1 1 1 1<br />

⎛<br />

×<br />

⎜<br />

⎝<br />

1095163 1 1 1 1 0904837<br />

0095163 2 1 1 1 0904837<br />

1 0095163 2 1 1 0904837<br />

1 1 0095163 2 1 0904837<br />

1 1 1 0095163 2 0904837<br />

1 1 1 1 0095163 1904837<br />

⎞<br />

⎟<br />

⎠<br />

−1<br />

⎞<br />

⎟<br />

⎠<br />

<br />

−1


Bonus-Malus Scales 181<br />

= 1 1 1 1 1 1<br />

⎛<br />

2305672 −0678486 −0565648 −0440943 −0303122 −0150806<br />

1305672 0321514 −0565648 −0440943 −0303122 −0150806<br />

×<br />

0400834 0226351 0434352 −0440943 −0303122 −0150806<br />

⎜ −0417897 0140245 0339189 0559057 −0303122 −0150806<br />

⎝ −1158715 0062332 0253083 0463895 0696878 −0150806<br />

−1829035 −0008166 0175170 0377788 0601716 0849194<br />

= 0606531 0063789 0070498 0077913 0086107 0095163.<br />

Coming back to Example 4.7, we see that the matrices P k 01, k ≥ 5, have all their rows<br />

equal to T 01.<br />

⎞<br />

<br />

⎟<br />

⎠<br />

Example 4.10 (−1/+2 scale)<br />

Formula (4.9) gives here<br />

T = 1 1 1 1 1 1<br />

⎛<br />

( )<br />

2 − exp− 1 1− exp− 1 1− 2<br />

exp−exp− 1 + + 2<br />

2 2<br />

1 − exp− 2 1 1− exp− 1 exp−1 + <br />

×<br />

1 1− exp− 2 1 1− exp− exp−1 + <br />

⎜ 1 1 1− exp− 2 1 exp−<br />

⎝ 1 1 1 1− exp− 2 exp−<br />

1 1 1 1 1− exp− 1 + exp−<br />

⎞<br />

−1<br />

<br />

⎟<br />

⎠<br />

Applying this formula in the particular case = 01, we get<br />

T 01 = 1 1 1 1 1 1<br />

⎛<br />

⎞<br />

1095163 1 0909516 1 0995476 0999845<br />

0095163 2 1 0909516 1 0995321<br />

×<br />

1 0095163 2 1 0909516 0995321<br />

⎜ 1 1 0095163 2 1 0904837<br />

⎟<br />

⎝ 1 1 1 0095163 2 0904837 ⎠<br />

1 1 1 1 0095163 1904837<br />

= 1 1 1 1 1 1<br />

−1<br />

⎛<br />

2759072 −0630802 −0512948 −0658608 −0480599 −0309449<br />

1659534 0358730 −0524518 −0561440 −0472165 −0293474<br />

×<br />

0578107 0244995 0454957 −0475982 −0366345 −0269065<br />

⎜−0478699 0133850 0332122 0599117 −0272234 −0147489<br />

⎝−1434937 0033282 0220977 0571906 0812921 −0037482<br />

−2300176 −0057717 0120409 0547285 0794810 1062056<br />

⎞<br />

⎟<br />

⎠<br />

= 0782901 0082338 0090998 0022278 0016387 0005097


182 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.4.3 Dufresne Algorithm<br />

Dufresne (1988,1995) proposed a simple and efficient iterative algorithm for deriving <br />

provided that the driver goes one level down if no claims are filed to the company, and goes<br />

n×pen levels up if n claims are reported to the insurer. Then, the move at the end <strong>of</strong> year k<br />

can be modelled as<br />

{ −1 if no claims<br />

k+1 =<br />

n × pen if n claims<br />

If the annual numbers <strong>of</strong> claims N 1 N 2 are independent and oi distributed then the<br />

sequence 1 2 is made up <strong>of</strong> independent and identically distributed random<br />

variables, with common probability mass function<br />

Pr k+1 =−1 = PrN k+1 = 0 = exp−<br />

Pr k+1 = n × pen = PrN k+1 = n = exp− n<br />

n!<br />

Pr k+1 = = 0 otherwise<br />

The level L k+1 can then be represented as<br />

for n = 1 2<br />

⎧<br />

⎪⎨ L k + k+1 if 0 ≤ L k + k+1 ≤ s<br />

L k+1 = 0ifL k + k+1 =−1<br />

⎪⎩<br />

s if L k + k+1 >s<br />

Let us denote as F k · the distribution function <strong>of</strong> L k , that is<br />

F k l = PrL k ≤ l l = 0 1s<br />

Furthermore, let us denote as p · the common probability mass function <strong>of</strong> the k s. We<br />

then have<br />

F k+1 l =<br />

l∑<br />

F k l − yp y<br />

y=−1<br />

with F k+1 s = 1. The stationary distribution F · is then obtained as<br />

F l = lim F kl =<br />

k→+<br />

l∑<br />

F l − yp y<br />

with F s = 1. Obviously, the l s are then recovered from 0 = F 0 and<br />

y=−1<br />

l = F l − F l − 1 for l = 1s<br />

The values <strong>of</strong> F l can be computed recursively from the following algorithm:


Bonus-Malus Scales 183<br />

(i) set A0 = 1<br />

(ii) compute for l = 0 1s− 1<br />

(iii) set<br />

Al + 1 =<br />

(<br />

1<br />

Al −<br />

p −1<br />

F l = Al<br />

As<br />

)<br />

l∑<br />

Al − yp y<br />

y=0<br />

for l = 0 1s<br />

This recursive formula is computationally efficient, and easy to implement.<br />

4.4.4 Convergence to the Stationary Distribution<br />

Geometric Bound for the Speed <strong>of</strong> Convergence<br />

What matters for a unique limit to exist is that one can find a positive integer n 0 such<br />

that<br />

=<br />

min<br />

ij∈0s<br />

p n 0<br />

ij > 0<br />

In other words, n 0 is such that all the entries <strong>of</strong> P n 0 are positive, and thus it is possible<br />

to reach any level starting from any other level in n 0 periods. Various inequalities indicate<br />

the speed <strong>of</strong> convergence to the limit distribution . It can be shown that<br />

p n<br />

ij − j ≤<br />

max<br />

i∈0s<br />

p n<br />

ij −<br />

min<br />

i∈0s<br />

p n<br />

ij ≤ ( 1 − )⌊ ⌋<br />

n<br />

n0 −1<br />

<br />

This inequality provides us with a geometric bound for the rate <strong>of</strong> convergence to the limit<br />

distribution. Further bounds can be determined using concepts <strong>of</strong> matrix algebra.<br />

Total Variation Distance<br />

The total variation metric is <strong>of</strong>ten used to measure the distance to the stationary distribution<br />

. Recall that the total variation distance between two random variables X and Y , denoted<br />

as d TV X Y, is given by<br />

d TV X Y =<br />

∫ +<br />

−<br />

dF X t − dF Y t (4.10)<br />

For counting random variables M and N , (4.10) obviously reduces to<br />

d TV M N =<br />

+∑<br />

k=0<br />

∣ PrM = k − PrN = k ∣ ∣<br />

There is a close connection between d TV and the standard variation distance, which<br />

considers the supremum <strong>of</strong> the difference between the probability masses given to some


184 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

random events. Specifically, given two random variables X and Y , d TV X Y can be<br />

represented as<br />

∣<br />

d TV X Y = 2 sup ∣ PrX ∈ A − PrY ∈ A∣ (4.11)<br />

A<br />

Selection <strong>of</strong> the Initial Level<br />

The main objective <strong>of</strong> a bonus-malus system is to correct the inadequacies <strong>of</strong> a priori rating<br />

by separating the good from the bad drivers. This separation process should proceed as fast<br />

as possible; the time needed to achieve this operation is the time needed to reach stationarity.<br />

A convenient way to select the initial level has been suggested by Bonsdorff (1992). The<br />

idea is to select it in order to minimize the time needed to reach stationarity. It relies on<br />

the total variation distance d TV between the nth transient distribution starting from level l 1 ,<br />

i.e. p n<br />

l 1 l 2<br />

l 2 = 0 1s, and the stationary distribution l2<br />

l 2 = 0 1s,<br />

computed as<br />

d TV l 1 n=<br />

s∑<br />

l 2 =0<br />

p n<br />

l 1 l 2<br />

− l2<br />

<br />

it measures the degree <strong>of</strong> convergence <strong>of</strong> the system after n transitions. Of course,<br />

lim<br />

n→+ d TV l 1 n= 0 for all l 1 and <br />

The convergence to is essentially controlled by the second largest eigenvalue 1 .<br />

For any > 1 , there exists an a


Bonus-Malus Scales 185<br />

4.5.2 Bayesian Relativities<br />

Predictive accuracy is a useful measure <strong>of</strong> the efficiency <strong>of</strong> a bonus-malus scale. The idea<br />

behind this notion is as follows: A bonus-malus scale is good at discriminating among the<br />

good and the bad drivers if the premium they pay is close to their ‘true’ premium. According<br />

to Norberg (1976), once the number <strong>of</strong> classes and the transition rules have been fixed,<br />

the optimal relativity r l associated with level l is determined by maximizing the asymptotic<br />

predictive accuracy.<br />

Let us pick at random a policyholder from the portfolio. Both the a priori expected<br />

claim frequency and relative risk parameter are random in this case. Let us denote as the<br />

(random) a priori expected claim frequency <strong>of</strong> this randomly selected policyholder, and as<br />

the residual effect <strong>of</strong> the risk factors not included in the ratemaking. The actual (unknown)<br />

annual expected claim frequency <strong>of</strong> this policyholder is then . Since the random effect<br />

represents residual effects <strong>of</strong> hidden covariates, the random variables and may<br />

reasonably be assumed to be mutually independent. Let w k be the weight <strong>of</strong> the kth risk<br />

class whose annual expected claim frequency is k . Clearly, Pr = k = w k .<br />

Let L be the level occupied by this randomly selected policyholder once the steady state<br />

has been reached. The distribution <strong>of</strong> L can be written as<br />

PrL = l = ∑ k<br />

∫ +<br />

w k l k dF (4.12)<br />

0<br />

Here, PrL = l represents the proportion <strong>of</strong> the policyholders in level l.<br />

Our aim is to minimize the expected squared difference between the ‘true’ relative premium<br />

and the relative premium r L applicable to this policyholder (after the steady state has been<br />

reached), i.e. the goal is to minimize<br />

The solution is given by<br />

]<br />

E<br />

[ − r L 2 =<br />

=<br />

s∑<br />

∣ ]<br />

E<br />

[ − r l 2 ∣∣L = l PrL = l<br />

l=0<br />

∫ +<br />

s∑<br />

l=0<br />

= ∑ k<br />

0<br />

w k<br />

∫ +<br />

0<br />

− r l 2 PrL = l = dF <br />

s∑<br />

− r l 2 l k dF <br />

l=0<br />

r l = EL = l<br />

[<br />

]<br />

∣<br />

= E EL = l ∣L = l<br />

= ∑ k<br />

= ∑ k<br />

EL = l = k Pr = k L = l<br />

∫ +<br />

0<br />

PrL = l = = kw k<br />

dF<br />

PrL = l = k Pr = kL= l<br />

PrL = l


186 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

=<br />

∑k w k<br />

∫ +<br />

0<br />

l k dF <br />

∑k w k<br />

∫ +<br />

0<br />

l k dF (4.13)<br />

It is easily seen that<br />

[ ]<br />

Er L = E EL = E = 1<br />

resulting in financial equilibrium once steady state is reached.<br />

Remark 4.2 If the insurance company does not enforce any a priori ratemaking system,<br />

all the k s are equal to E = and (4.13) reduces to the formula<br />

r l =<br />

∫ +<br />

0<br />

l dF <br />

∫ +<br />

0<br />

l dF <br />

(4.14)<br />

that has been derived in Norberg (1976).<br />

The way a priori and a posteriori ratemakings interact is described by<br />

EL = l = ∑ k<br />

k Pr = k L = l<br />

= ∑ k<br />

=<br />

k<br />

PrL = l = k w k<br />

PrL = l<br />

∑k k w k<br />

∫ +<br />

0<br />

l k dF <br />

∑k w k<br />

∫ +<br />

0<br />

l k dF (4.15)<br />

If EL = l is indeed increasing in the level l, those policyholders who have been granted<br />

premium discounts at policy issuance (on the basis <strong>of</strong> their observable characteristics) will<br />

also be rewarded a posteriori (because they occupy the lowest levels <strong>of</strong> the bonus-malus<br />

scale). Conversely, the policyholders who have been penalized at policy issuance (because<br />

<strong>of</strong> their observable characteristics) will cluster in the highest bonus-malus levels and will<br />

consequently be penalized again.<br />

Example 4.11 (−1/Top Scale, Portfolio A) The results for the bonus-malus scale −1/top<br />

are displayed in Table 4.3. Specifically, the values in the third column are computed with<br />

the help <strong>of</strong> (4.14) with â = 0889 and ̂ = 01474. Those values were obtained in Section 1.6<br />

by fitting a Negative Binomial distribution to the portfolio observed claim frequencies given<br />

in Table 1.1. Integrations have been performed numerically with the QUAD procedure <strong>of</strong><br />

SAS R /IML. The fourth column is based on (4.13) with â = 1065 and the ̂ k s listed in<br />

Table 2.7.<br />

Once the steady state has been reached, the majority <strong>of</strong> the policies (58.5 % <strong>of</strong> the<br />

portfolio) occupy level 0 and enjoy the maximum discount. The remaining 41.5 % <strong>of</strong> the<br />

portfolio are distributed over levels 1–5, with about 13 % in level 5 (those policyholders who<br />

just claimed). Concerning the relativities, the minimum percentage <strong>of</strong> 54.7 % when the a<br />

priori ratemaking is not recognized becomes 61.2 % when the relativities are adapted to the


Bonus-Malus Scales 187<br />

Table 4.3 Numerical characteristics for the system −1/top and for Portfolio A.<br />

Level l PrL = l r l = EL = l<br />

without a priori<br />

ratemaking<br />

r l = EL = l<br />

with a priori<br />

ratemaking<br />

EL = l<br />

5 128 % 1973 % 1812% 163%<br />

4 97 % 1709 % 1599% 158%<br />

3 77 % 1507 % 1439% 155%<br />

2 62 % 1348 % 1313% 152%<br />

1 52 % 1220 % 1209% 150%<br />

0 585% 547% 612% 141%<br />

a priori risk classification. Similarly, the relativity attached to the highest level <strong>of</strong> 197.3 %<br />

gets reduced to 181.2 %. The severity <strong>of</strong> the a posteriori corrections is thus weaker once the<br />

a priori ratemaking is taken into account in the determination <strong>of</strong> the r l s. The last column<br />

<strong>of</strong> Table 4.3 indicates the extent to which a priori and a posteriori ratemakings interact.<br />

The numbers in this column are computed as (4.15). The average a priori expected claim<br />

frequency clearly increases with the level l occupied by the policyholder.<br />

Example 4.12 (−1/Top Scale, Portfolio B) The results for the bonus-malus scale −1/top<br />

are displayed in Table 4.4. We only give the relativities computed by taking into account<br />

the a priori risk classification that is taken from Table 2.16 with ̂ = 0677. We see that<br />

in Portfolio B, only 46.6 % <strong>of</strong> the policyholders occupy level 0. The relativities are now<br />

less dispersed, ranging from 70.6 % to 146.9 % (instead <strong>of</strong> 61.2 % to 181.2 %). Again, the<br />

last column indicates that a priori risk classification and a posteriori premium corrections<br />

interact.<br />

Example 4.13 (−1/+2 Scale, Portfolio A) Results are displayed in Table 4.5 which is<br />

the analogue <strong>of</strong> Table 4.3 for the bonus-malus scale −1/ + 2. The bonus-malus system is<br />

perhaps too s<strong>of</strong>t since the vast majority <strong>of</strong> the portfolio (about 71 %) clusters in the super<br />

bonus level 0. The higher levels are occupied by a very small minority <strong>of</strong> drivers. Such a<br />

system does not really discriminate between good and bad drivers. Consequently, only those<br />

Table 4.4 Numerical characteristics for the system −1/top and for portfolio B.<br />

Level l PrL = l r l = EL = l EL = l<br />

with a priori<br />

ratemaking<br />

5 163 % 1469% 195%<br />

4 125 % 1293% 193%<br />

3 99 % 1172% 191%<br />

2 80 % 1080% 190%<br />

1 66 % 1007% 189%<br />

0 466% 706% 184%


188 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.5 Numerical characteristics for the system −1/ + 2 and for Portfolio A.<br />

Level l PrL = l r l = EL = l<br />

without a priori<br />

ratemaking<br />

r l = EL = l<br />

with a priori<br />

ratemaking<br />

EL = l<br />

5 44 % 3091 % 2714% 185%<br />

4 47 % 2414 % 2185% 171%<br />

3 44 % 2077 % 1925% 164%<br />

2 87 % 1429 % 1388% 153%<br />

1 71 % 1302 % 1286% 151%<br />

0 706% 624% 685% 142%<br />

policyholders in level 0 get some discount whereas occupancy <strong>of</strong> any level 1–5 implies some<br />

penalty. Again, the a posteriori corrections are s<strong>of</strong>tened when a priori risk classification is<br />

taken into account in the determination <strong>of</strong> the r l s. The comments made for the scale −1/top<br />

still apply to this bonus-malus scale.<br />

Example 4.14 (−1/+2 Scale, Portfolio B) Results are displayed in Table 4.6 which is the<br />

analogue <strong>of</strong> Table 4.4 for the bonus-malus scale −1/ + 2. The comparison with Portfolio A<br />

yields the same comments as before.<br />

Example 4.15 (−1/+3 Scale, Portfolio A) Let us now make the −1/ + 2 bonus-malus<br />

scale more severe: to this end, each claim is now penalized by 3 levels (instead <strong>of</strong> 2 in the<br />

−1/ + 2 system). The numerical results are displayed in Table 4.7.<br />

We see that less policyholders occupy level 0 (64.5 % compared with 70.6 % with the<br />

−1/+2 system), and that the upper levels are now more populated. Drivers in level 0 deserve<br />

more bonus compared to the −1/+2 system (they pay 57.8 % <strong>of</strong> the base premium compared<br />

to 62.4% in the non-segmented case, and 64.2 % compared to 68.5 % in the segmented case).<br />

Also, the maximal penalties get reduced when claims are more severely penalized.<br />

Example 4.16 (−1/+3 Scale, Portfolio B) Let us now consider Portfolio B where each<br />

claim is penalized by 3 levels (instead <strong>of</strong> 2 in the −1/ + 2 system). The numerical results<br />

Table 4.6 Numerical characteristics for the system −1/ + 2 and for Portfolio B.<br />

Level l PrL = l r l = EL = l<br />

with a priori<br />

ratemaking<br />

EL = l<br />

5 54 % 2232% 204%<br />

4 60 % 1713% 199%<br />

3 58 % 1489% 196%<br />

2 115 % 1121% 191%<br />

1 93 % 1050% 190%<br />

0 620% 747% 185%


Bonus-Malus Scales 189<br />

Table 4.7 Numerical characteristics for the system −1/ + 3 and for Portfolio A.<br />

Level l PrL = l r l = EL = l<br />

without a priori<br />

ratemaking<br />

r l = EL = l<br />

with a priori<br />

ratemaking<br />

EL = l<br />

5 73 % 2571 % 2308 % 17.4 %<br />

4 59 % 2194 % 2009 % 16.7 %<br />

3 90 % 1517 % 1452 % 15.5 %<br />

2 73 % 1365 % 1331 % 15.2 %<br />

1 60 % 1240 % 1230 % 15.0 %<br />

0 645% 578% 642 % 14.1 %<br />

Table 4.8 Numerical characteristics for the system −1/ + 3 and for Portfolio B.<br />

Level l PrL = l r l = EL = l<br />

with a priori<br />

ratemaking<br />

EL = l<br />

5 92 % 1840% 200%<br />

4 78 % 1566% 197%<br />

3 117 % 1174% 192%<br />

2 95 % 1086% 190%<br />

1 78 % 1016% 189%<br />

0 540% 720% 184%<br />

are displayed in Table 4.8. The comparison with Portfolio A shows that the relativities are<br />

much less dispersed here.<br />

4.5.3 Interaction between Bonus-Malus Systems and a Priori Ratemaking<br />

Since the relativities attached to the different levels are the same whatever the risk class<br />

to which the policyholders belong, those scales overpenalize a priori bad risks. Let us<br />

explain this phenomenon, put in evidence by Taylor (1997). Over time, policyholders will<br />

be distributed over the levels <strong>of</strong> the bonus-malus scale. Since their trajectory is a function<br />

<strong>of</strong> past claims history, policyholders with low a priori expected claim frequencies will<br />

tend to gravitate to the lowest levels <strong>of</strong> the scale. Conversely for individuals with high a<br />

priori expected claim frequencies. Consider for instance a policyholder with a high a priori<br />

expected claim frequency, a young male driver living in a urban area, say. This driver<br />

is expected to report many claims (this is precisely why he has been penalized a priori)<br />

and so to be transferred to the highest levels <strong>of</strong> the bonus-malus scale. On the contrary, a<br />

policyholder with a low a priori expected claim frequency, a middle-aged lady living in<br />

a rural area, say, is expected to report few claims and so to gravitate to the lowest levels<br />

<strong>of</strong> the scale. The level occupied by the policyholders in the bonus-malus scale can thus be<br />

partly explained by their observable characteristics included in the price list. It is thus fair to<br />

isolate that part <strong>of</strong> the information contained in the level occupied by the policyholder that


190 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

does not reflect observables characteristics. A posteriori corrections should be driven only<br />

by this part <strong>of</strong> the bonus-malus information.<br />

In credibility theory, we have seen that to the extent that good drivers are rewarded in<br />

their base premiums (through other rating variables) the size <strong>of</strong> the bonus they require for<br />

equity is reduced. This can be summarized as follows:<br />

• when a priori segmentation is used, then the severity <strong>of</strong> the a posteriori differentiation<br />

has to be lowered.<br />

• when both a priori and a posteriori ratemakings are used, the size <strong>of</strong> the bonus is always<br />

smaller for good drivers than for bad ones.<br />

If a single bonus-malus scale is applied to the entire portfolio, even if the relativites take a<br />

priori risk classification into account, the resulting ratemaking is unfair to a priori bad drivers.<br />

The bonuses and the maluses need to be functions <strong>of</strong> the drivers’ a priori characteristics,<br />

used in the price list.<br />

We know from credibility theory that the a posteriori corrections are functions <strong>of</strong> the a<br />

priori characteristics; see for instance (3.16). On the contrary, when a bonus-malus system<br />

is in force, the same a posteriori corrections apply to all policyholders, whatever their a<br />

priori expected claim frequency. This <strong>of</strong> course induces unfairness in the portfolio.<br />

In order to reduce the unfairness <strong>of</strong> the tariff, we could propose several bonus-malus scales,<br />

according to the a priori characteristics. The idea is to select a few a priori characteristics<br />

inducing large differences in expected claim frequencies (typically, those associated with the<br />

largest regression coefficients), and to build separate scales according to these characteristics.<br />

Example 4.17 (−1/+2 System, Portfolio A, with a Dichotomy Rural–Urban) Table 4.9<br />

describes such a system where the company differentiates policyholders according to the<br />

type <strong>of</strong> district where they live (urban or rural). People living in urban areas have higher a<br />

priori expected claim frequencies. Thus, they should be better rewarded when they do not file<br />

any claim and less penalized when they report accidents compared to people living in rural<br />

zones. This is indeed what we observe when we compare the relative premiums obtained for<br />

the system −1/ + 2: the maximal discount is 66.6 % for urban policyholders, compared to<br />

71.3 % for rural ones. Similarly, the highest penalty is 267.2 % for urbans against 280.4 %<br />

for rurals.<br />

Table 4.9 Relativities obtained by differentiating policyholders<br />

according to the type <strong>of</strong> district for the system −1/ + 2 and for<br />

Portfolio A.<br />

Level l Rural Urban<br />

5 2804 % 2672%<br />

4 2264 % 2143%<br />

3 2008 % 1879%<br />

2 1444 % 1353%<br />

1 1343 % 1249%<br />

0 713% 666%


Bonus-Malus Scales 191<br />

4.5.4 Linear Relativities<br />

As demonstrated in the numerical examples with the −1/top, −1/+2 and −1/+3 scales, the<br />

relativities obtained above may exhibit a rather irregular pattern, and this may be undesirable<br />

for commercial purposes. It may therefore be interesting to smooth this scale in order to<br />

obtain relativities which are regularly increasing according to the level. As suggested by<br />

Gilde & Sundt (1989), a linear scale <strong>of</strong> the form r lin<br />

l<br />

= + l l = 0 1s could<br />

then be desirable. Then, Norberg’s maximum accuracy criterion becomes a constrained<br />

minimization:<br />

[ ] ]<br />

min E − r lin<br />

L 2 = min E<br />

[ − − L 2 (4.16)<br />

Setting the derivative <strong>of</strong> the objective function with respect to equal to 0 yields<br />

= E − EL<br />

Doing the same with the derivative <strong>of</strong> the objective function with respect to gives<br />

[<br />

]<br />

0 = E L − − L<br />

= EL − EL − EL 2 <br />

Replacing the value <strong>of</strong> with the expression found above, we get<br />

0 = EL − ELE + EL 2 − EL 2 <br />

= CL − VL<br />

The solution <strong>of</strong> the optimization problem (4.16) is thus given by:<br />

=<br />

CL <br />

VL<br />

and = −<br />

CL <br />

EL (4.17)<br />

VL<br />

The linear relative premium scale is thus <strong>of</strong> the form<br />

where<br />

r lin<br />

l<br />

= 1 +<br />

CL <br />

l − EL<br />

VL<br />

CL = EL − EL<br />

= ∑ s∑<br />

w k lE l k − EL<br />

k l=0<br />

= ∑ s∑ ∫ +<br />

w k l l k dF − EL<br />

k l=0<br />

0


192 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

PrL = l = ∑ ∫ +<br />

w k l k dF <br />

k<br />

0<br />

s∑<br />

EL = l PrL = l<br />

l=0<br />

s∑<br />

VL = l − EL 2 PrL = l<br />

l=0<br />

Remark 4.3<br />

It is interesting to note that the optimal and solving (4.16) also minimize<br />

[ (EL ) ] [<br />

E − r<br />

lin 2 (EL ) ] 2<br />

= E − − L<br />

L<br />

so that the linear relativities provide the best linear fit to the Bayesian ones. To check this<br />

assertion, note that<br />

[ ( ) ] 2<br />

E − − L<br />

[( ( ) ( ) ) 2]<br />

= E − EL + EL − − L<br />

[ ( ) ] 2<br />

= E − EL<br />

[ ( )( ) ]<br />

+ 2E − EL EL − − L<br />

[ (EL ) ] 2<br />

+ E − − L<br />

and the second term vanishes by definition <strong>of</strong> the conditional expectation EL, since<br />

− EL is orthogonal to any function <strong>of</strong> L.<br />

Example 4.18 (−1/+2 Scale, Portfolio A) Table 4.10 allows comparison <strong>of</strong> the relativities<br />

<strong>of</strong> the two scales in the segmented case. As we can see, the values are close to each other.<br />

The constant step between two levels in the linear scale is equal to 393%.<br />

The values <strong>of</strong> the expected errors Q 1 = [ − r L 2] and Q 2 = [ − rL lin2]<br />

are<br />

respectively given by<br />

Q 1 = 06219 and Q 2 = 06261<br />

The small difference between Q 1 and Q 2 indicates that the additional linear restriction does<br />

not really produce any deterioration in the fit. Note that the large differences in levels 1–5<br />

are given low weights in the computation <strong>of</strong> Q 1 and Q 2 .<br />

Table 4.11 displays the relativities without a priori segmentation. As can be observed, the<br />

scale without a priori segmentation is more elastic, which is logical since it has to take into<br />

account the full heterogeneity. The mean square errors are now given by<br />

Q 1 = 06630 and Q 2 = 06688


Bonus-Malus Scales 193<br />

Table 4.10 Linear relativities with a priori risk classification for the system −1/ + 2<br />

and for Portfolio A.<br />

Level l<br />

Unconstrained relativities r l<br />

with a priori ratemaking<br />

Linear relativities rl<br />

lin<br />

priori ratemaking<br />

with a<br />

5 2714 % 2662%<br />

4 2185 % 2269%<br />

3 1925 % 1875%<br />

2 1388 % 1482%<br />

1 1286 % 1088%<br />

0 685% 695%<br />

Table 4.11 Linear relativities without a priori risk classification for the system<br />

−1/ + 2 and for Portfolio A.<br />

Level l Unconstrained relativities r l Linear relativities rl<br />

lin<br />

without a priori ratemaking without a priori ratemaking<br />

5 3091 % 2983%<br />

4 2414 % 2512%<br />

3 2077 % 2041%<br />

2 1429 % 1571%<br />

1 1302 % 1100%<br />

0 624% 629%<br />

so that again the additional linear restriction does not really produce any deterioration in the<br />

fit. Obviously, Q 1 and Q 2 are higher than before (when a priori risk classification was in<br />

force). This was expected as is now more variable.<br />

Example 4.19 (−1/+2 Scale, Portfolio B)<br />

Portfolio B. The mean square errors are<br />

Table 4.12 gives the linear relativities for<br />

Q 1 = 04134 and Q 2 = 04182<br />

We observe large discrepancies between r l and rl<br />

lin , especially in the higher levels. The<br />

constant penalty by step in the linear scale is 26.2 %.<br />

4.5.5 Approximations<br />

In Chapter 3, several discrete approximations to allowed us to derive simplified versions<br />

<strong>of</strong> the credibility formulas (replacing integrals with sums). The same idea can be applied<br />

here. Considering (4.13), the expression for r l when has support points 1 k with<br />

respective probability masses p 1 p q as in (3.5) becomes<br />

r l =<br />

∑<br />

k w k<br />

∑ q<br />

j=1 j l k j p j<br />

∑k w k<br />

∑ q<br />

j=1 l k j p j


194 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.12 Linear relativities with risk classification for the system −1/ + 2<br />

and for Portfolio B.<br />

Level l Unconstrained relativities r l Linear relativities rl<br />

lin<br />

with a priori ratemaking<br />

with a priori ratemaking<br />

5 2232 % 2047%<br />

4 1713 % 1785%<br />

3 1489 % 1522%<br />

2 1121 % 1260%<br />

1 1050% 998%<br />

0 747% 736%<br />

The discrete approximations listed in Tables 3.5–3.6 can then be used in this formula.<br />

In the case where has a unimodal probability density function, the mixed uniform<br />

approximations <strong>of</strong> Tables 3.8–3.9 can also be used.<br />

4.6 Relativities with an Exponential Loss Function<br />

4.6.1 Bayesian Relativities<br />

This section proposes an asymmetric loss function with one parameter that reflects the<br />

severity <strong>of</strong> the bonus-malus system. In order to reduce the maluses obtained with a quadratic<br />

loss, keeping a financially balanced system, we resort on an exponential loss function. Such<br />

loss functions have been applied in Section 3.4 in the classical credibility setting. Our purpose<br />

here is to apply exponential loss functions to determine the optimal relativities.<br />

When using the exponential loss function, the goal is now to minimize<br />

[<br />

]<br />

Q exp = E exp−c − r L <br />

(4.18)<br />

under the financial balance constraint Er L = 1. The parameter c>0 determines the<br />

‘severity’ <strong>of</strong> the bonus-malus scale. The loss (4.18) puts more weight on the errors<br />

resulting in an overestimation <strong>of</strong> the premium (i.e. r L >), than on those coming from<br />

an underestimation. Consequently, the maluses are reduced, as well as the bonuses since<br />

financial stability has been imposed.<br />

Let us derive the general solution <strong>of</strong> (4.18).<br />

Proposition 4.1<br />

The solution <strong>of</strong> the constrained optimization problem (4.18) is<br />

r exp<br />

L<br />

= 1 + 1 c<br />

( [<br />

]<br />

)<br />

E ln Eexp−cL − ln Eexp−cL (4.19)<br />

Pro<strong>of</strong><br />

First, note that<br />

( [<br />

])<br />

exp ( ) expc exp E ln E exp−cL<br />

cr exp<br />

L =<br />

E [ exp−cL ]


Bonus-Malus Scales 195<br />

Now, we have to minimize (4.18) that can be rewritten as<br />

[<br />

] [<br />

]<br />

E exp−c − r L = E expcr L − r exp<br />

L <br />

× expc exp ( E [ ln E exp−cL ]) <br />

Invoking Jensen’s inequality yields<br />

[<br />

E exp−c − r L )] (<br />

≥ exp cE [ ] )<br />

r L − r exp<br />

L<br />

} {{ }<br />

=1<br />

× expc exp<br />

[<br />

= E<br />

( [<br />

E<br />

exp−c − r exp<br />

L <br />

])<br />

ln E exp−cL<br />

]<br />

<br />

which ends the pro<strong>of</strong>.<br />

□<br />

Let us now compute the quantities in (4.19). Firstly,<br />

[<br />

]<br />

∣<br />

Eexp−cL = l =E Eexp−cL = l ∣L = l<br />

= ∑ k<br />

Eexp−cL = l = k Pr = k L = l<br />

= ∑ k<br />

∫ +<br />

0<br />

exp−c PrL = l = = kw k<br />

dF<br />

PrL = l = k <br />

× Pr = kL= l<br />

PrL = l<br />

∫ + ∑k w k exp−c<br />

0 l k dF <br />

= ∫ +<br />

(4.20)<br />

∑k w k <br />

0 l k dF <br />

and secondly<br />

[<br />

]<br />

E ln Eexp−cL<br />

(4.21)<br />

=<br />

=<br />

s∑<br />

PrL = l ln Eexp−cL = l<br />

l=0<br />

( ∑ ∫<br />

s∑<br />

+<br />

)<br />

k w k exp−c<br />

0 l k dF <br />

PrL = l ln<br />

(4.22)<br />

PrL = l<br />

l=0


196 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Remark 4.4 If no a priori ratemaking is in force, the expressions (4.20) and (4.22) are<br />

equal to those derived in Denuit & Dhaene (2001), that is<br />

Eexp−cL = l =<br />

∫ +<br />

0<br />

exp−c l dF <br />

PrL = l<br />

and<br />

[<br />

]<br />

E ln Eexp−cL =<br />

=<br />

s∑<br />

PrL = l ln Eexp−cL = l<br />

l=0<br />

s∑<br />

PrL = l ln<br />

l=0<br />

(∫ +<br />

0<br />

exp−c l dF <br />

PrL = l<br />

)<br />

<br />

4.6.2 Fixing the Value <strong>of</strong> the Severity Parameter<br />

Let us briefly explain a possible criterion to fix the value <strong>of</strong> the parameter c. First, note that<br />

lim r exp<br />

l c→0<br />

= EL = l = r quad<br />

l<br />

<br />

where r quad<br />

l<br />

is the relativity obtained with a quadratic loss function. Letting c tend to 0 thus<br />

yields Norberg’s approach. In other words, the bonus-malus scale becomes more severe as<br />

c decreases. Now, the ratio <strong>of</strong> the variances <strong>of</strong> the premiums obtained with an exponential<br />

and a quadratic loss is given by<br />

Vr exp<br />

L <br />

Vr quad<br />

L = 1 V [ ln E exp−cL ]<br />

c 2 V [ ] = % ≤ 100 %<br />

E L<br />

The fact that the ratio <strong>of</strong> the variances is less than unity comes from the Jensen inequality.<br />

The idea is then to select the variance <strong>of</strong> the premium in the new system as a fraction <strong>of</strong> the<br />

corresponding variance under a quadratic loss (for instance = 25, 50 or 75 %). Of course,<br />

other procedures can be applied. For instance, the actuary could select the value <strong>of</strong> r 0 ,or<strong>of</strong><br />

r s , and then compute c in order to match this value.<br />

4.6.3 Linear Relativities<br />

In practice, a linear scale <strong>of</strong> the form rl<br />

lin = + l, l = 0 1s, could be desirable. Let<br />

us now indicate how Gilde & Sundt’s (1989) approach can be extended using exponential<br />

loss functions. The aim is now to minimize the objective function<br />

[<br />

= E exp ( − c − − L )]


Bonus-Malus Scales 197<br />

under the financial balance constraint<br />

Er lin<br />

L = E = 1<br />

⇔ E = + EL ⇔ = E − EL<br />

It suffices to minimize<br />

[ ( (<br />

))]<br />

˜ = E exp − c − E − L − EL <br />

Differentiating ˜ with respect to and equating to zero yields<br />

[<br />

(<br />

E L − EL exp − c ( − E − L − EL ))] = 0<br />

⇔<br />

∫ +<br />

0<br />

s∑<br />

( (<br />

))<br />

l − EL exp − c − E − l − EL l dF = 0<br />

l=0<br />

which has to be solved numerically to get the value <strong>of</strong> (and hence <strong>of</strong> ). Convenient<br />

starting values for the numerical search are provided by (4.17).<br />

4.6.4 Numerical Illustration<br />

In this section, we give numerical examples <strong>of</strong> computation <strong>of</strong> bonus-malus scales<br />

when using an exponential loss function. We compare the results with those obtained<br />

previously.<br />

In order to be able to compare the results, we have computed the relativities associated<br />

with the same severity factor c = 1. These results are given in Table 4.13 for the −1/top,<br />

−1/ + 2 and −1/ + 3 systems and Portfolio A. Specifically, Table 4.13 gives the relativities<br />

obtained with an exponential loss function with severity parameter c = 1, with and without a<br />

priori risk classification, as well as the analogues for the −1/ + 2 and −1/ + 3 systems. As<br />

was the case with the quadratic loss function, we see that a priori risk classification reduces<br />

the dispersion <strong>of</strong> the relativities.<br />

We observe that the relativities computed when no a priori ratemaking is in force give<br />

bigger bonuses when the policyholders are in level 0 but also impose bigger maluses in the<br />

other levels. Indeed, when no a priori ratemaking is in force, the a posteriori correction<br />

must be more severe in order to distinguish between good and bad drivers. On the contrary,<br />

when an a priori ratemaking is in force, the correction applied with the a posteriori tariff<br />

must be s<strong>of</strong>ter because a greater part <strong>of</strong> the risk is already taken into account in the a priori<br />

ratemaking.<br />

Influence <strong>of</strong> the Loss Function<br />

We can also compare the results obtained when using the quadratic loss function and<br />

when using the exponential loss function (with different values <strong>of</strong> c). These are given in


198 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.13 Numerical characteristics for the systems −1/top,<br />

−1/ + 2 and −1/ + 3 with a severity c = 1 and Port<strong>of</strong>olio A.<br />

−1/top<br />

Level l<br />

Relativity r exp<br />

l<br />

without a priori<br />

ratemaking<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

5 1613 % 1541%<br />

4 1480 % 1423%<br />

3 1371 % 1329%<br />

2 1281 % 1252%<br />

1 1204 % 1187%<br />

0 690% 723%<br />

−1/ + 2<br />

Level l<br />

Relativity r exp<br />

l<br />

without a priori<br />

ratemaking<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

5 2534 % 2293%<br />

4 2044 % 1898%<br />

3 1834 % 1726%<br />

2 1328 % 1300%<br />

1 1249 % 1234%<br />

0 716% 758%<br />

−1/ + 3<br />

Level l<br />

Relativity r exp<br />

l<br />

without a priori<br />

ratemaking<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

5 2093 % 1947%<br />

4 1878 % 1763%<br />

3 1372 % 1334%<br />

2 1283 % 1259%<br />

1 1207 % 1194%<br />

0 693% 732%<br />

Tables 4.14, 4.15 and 4.16 respectively for the −1/top, −1/ + 2 and −1/ + 3 systems. The<br />

exponential relativities have been computed for a severity coefficient ranging from 0 to 5.<br />

The limit value <strong>of</strong> c = 0 provides the same result as with the quadratic loss function. We<br />

observe in Table 4.14 that an increasing value <strong>of</strong> c leads to less dispersed relativities. The<br />

maximal penalty decreases as c increases: keeping financial balance, increasing c tends to<br />

s<strong>of</strong>ten a posteriori corrections.


Bonus-Malus Scales 199<br />

Table 4.14 Numerical characteristics for the system −1/top and Port<strong>of</strong>olio A.<br />

Level l<br />

Relativity r quad<br />

l<br />

with a priori<br />

ratemaking c = 0<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 1<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 2<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 5<br />

5 1812 % 1541 % 1418 % 1262%<br />

4 1599 % 1423 % 1336 % 1219%<br />

3 1439 % 1329 % 1270 % 1184%<br />

2 1312 % 1252 % 1214 % 1153%<br />

1 1209 % 1187 % 1166 % 1126%<br />

0 612% 723% 779% 854%<br />

Table 4.15 Numerical characteristics for the system −1/ + 2 and Port<strong>of</strong>olio A.<br />

Level l<br />

Relativity r quad<br />

l<br />

with a priori<br />

ratemaking c = 0<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 1<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 2<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 5<br />

5 2714 % 2293 % 2073 % 1749%<br />

4 2185 % 1898 % 1743 % 1511%<br />

3 1925 % 1726 % 1612 % 1433%<br />

2 1388 % 1300 % 1249 % 1172%<br />

1 1286 % 1234 % 1200 % 1143%<br />

0 685% 758% 798% 859%<br />

Table 4.16 Numerical characteristics for the system −1/ + 3 and Port<strong>of</strong>olio A.<br />

Level l<br />

Relativity r quad<br />

l<br />

with a priori<br />

ratemaking c = 0<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 1<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 2<br />

Relativity r exp<br />

l<br />

with a priori<br />

ratemaking<br />

c = 5<br />

5 2308 % 1947 % 1767 % 1516%<br />

4 2009 % 1763 % 1632 % 1439%<br />

3 1452 % 1334 % 1271 % 1182%<br />

2 1331 % 1259 % 1216 % 1151%<br />

1 1230 % 1194 % 1169 % 1124%<br />

0 642% 732% 780% 849%


200 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

4.7 Special Bonus Rule<br />

4.7.1 The Former Belgian Compulsory System<br />

In this section we will concentrate on the former compulsory Belgian bonus-malus system,<br />

which all companies operating in Belgium have been obliged to use from 1992 to 2002. The<br />

Belgian system consists <strong>of</strong> a scale <strong>of</strong> 23 levels (numbered from 0 to 22). A new driver starts<br />

in class 11 if he uses his vehicle for pleasure and commuting and in class 14 if he uses his<br />

vehicle for business. Each claim-free year is rewarded by a bonus point. The first claim is<br />

penalized by four malus points and the subsequent ones by five malus points each.<br />

According to the special bonus rule, a policyholder with four claim-free years cannot be<br />

in a class above 14. This restriction is a concession to insureds with many claims in a few<br />

years and who suddenly improve; very few policyholders are ever able to take advantage <strong>of</strong><br />

this rule.<br />

Actually, the Belgian bonus-malus system is not Markovian due to the special bonus rule<br />

(i.e. due to the fact that policyholders occupying high levels are sent to level 14 after four<br />

claim-free years). Fortunately, it is possible to introduce fictitious classes in order to meet<br />

the memoryless property by splitting the levels 16 to 21 into subclasses, depending on the<br />

number <strong>of</strong> consecutive years without accident.<br />

4.7.2 Fictitious Levels<br />

Splitting the levels 16 to 21 into sub-levels, depending on the number <strong>of</strong> consecutive years<br />

without accident, allows us to account for the special bonus rule. Let n j be the number <strong>of</strong><br />

sub-levels to be associated with bonus level j. A level ji is to be understood as level j<br />

and i consecutive years without accidents. The transition rules are completely defined in<br />

Table 4.17 and the different values for n j are given in Table 4.18. We take some liberty<br />

with the notation by using the value 0 for the subscript i and by not using a subscript when<br />

n j = 1.<br />

4.7.3 Determination <strong>of</strong> the Relativities<br />

The relativities r ji are obtained by minimizing the squared difference between the true relative<br />

premium and the relative premium r L applicable to the policyholder when stationary state<br />

has been reached.<br />

The current situation is more complicated because some levels have to be constrained to<br />

have the same relativity. Indeed the artificial levels ji have the property that<br />

r j = r j1 =···=r jnj <br />

j = 0s<br />

We have to minimize E [ − r L 2] under these constraints.<br />

The solution is given by<br />

r j =<br />

∑<br />

k<br />

∑<br />

k<br />

∫ ∑ nj<br />

w k 0<br />

w k<br />

∫ <br />

0<br />

i=1 ji k dF <br />

∑ nj<br />

i=1 ji k dF (4.23)


Bonus-Malus Scales 201<br />

Table 4.17 Transition rules <strong>of</strong> the Belgian bonus-malus system with fictitious levels accounting for<br />

the special bonus rule.<br />

Class<br />

Class after k accidents<br />

k = 0 1 2 3 4 5<br />

22 211 22 22 22 22 22<br />

21.0 201 22 22 22 22 22<br />

21.1 202 22 22 22 22 22<br />

20.0 191 22 22 22 22 22<br />

20.1 192 22 22 22 22 22<br />

20.2 193 22 22 22 22 22<br />

19.0 181 22 22 22 22 22<br />

19.1 182 22 22 22 22 22<br />

19.2 183 22 22 22 22 22<br />

19.3 14 22 22 22 22 22<br />

18.0 17 22 22 22 22 22<br />

18.1 172 22 22 22 22 22<br />

18.2 173 22 22 22 22 22<br />

18.3 14 22 22 22 22 22<br />

17 16 210 22 22 22 22<br />

17.2 163 210 22 22 22 22<br />

17.3 14 210 22 22 22 22<br />

16 15 200 22 22 22 22<br />

16.3 14 200 22 22 22 22<br />

15 14 190 22 22 22 22<br />

14 13 180 22 22 22 22<br />

13 12 17 22 22 22 22<br />

12 11 16 210 22 22 22<br />

11 10 15 200 22 22 22<br />

10 9 14 190 22 22 22<br />

9 8 13 180 22 22 22<br />

8 7 12 17 22 22 22<br />

7 6 11 16 210 22 22<br />

6 5 10 15 200 22 22<br />

5 4 9 14 190 22 22<br />

4 3 8 13 180 22 22<br />

3 2 7 12 17 22 22<br />

2 1 6 11 16 210 22<br />

1 0 5 10 15 200 22<br />

0 0 4 9 14 190 22<br />

Note that it is easily seen that<br />

n j<br />

∑ PrL = ji<br />

r j = ∑ nj<br />

i=1 i=1 PrL = jir ji<br />

where the r ji s represent the non-constrained solution <strong>of</strong> the minimization <strong>of</strong> [ − r L 2] .


202 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.18 Sub-levels <strong>of</strong> the<br />

Belgian bonus-malus system.<br />

j<br />

n j<br />

22 1<br />

21 2<br />

20 3<br />

19 4<br />

18 4<br />

17 3<br />

16 2<br />

15 1<br />

14 1<br />

13 1<br />

12 1<br />

11 1<br />

10 1<br />

9 1<br />

8 1<br />

7 1<br />

6 1<br />

5 1<br />

4 1<br />

3 1<br />

2 1<br />

1 1<br />

0 1<br />

We also immediately verify that<br />

s∑<br />

r j PrL = j = 1<br />

j=0<br />

which ensures that the bonus-malus scale is financially balanced at stationary state.<br />

4.7.4 Numerical Illustration<br />

The numerical results for the Belgian bonus-malus system and for Portfolio A are displayed in<br />

Table 4.19. In this table, the special bonus rule has not been taken into account. Specifically,<br />

the values in the third column are computed with the help <strong>of</strong> (4.14) with â = 0889 and<br />

̂ = 01474. Those values were obtained by fitting a Negative Binomial distribution to the<br />

portfolio’s observed claims frequencies. The fourth column is based on (4.13) with â = 1065<br />

and the ̂ k s obtained from the a priori risk classification described in Table 2.7. The last<br />

column is computed with the help <strong>of</strong> (4.15).<br />

Once the stationary state has been reached, more or less half <strong>of</strong> the policies occupy level<br />

0 and enjoy the maximum discount. This is due to the fact that the transition rules <strong>of</strong> the


Bonus-Malus Scales 203<br />

Table 4.19 Numerical characteristics for the Belgian bonus-malus system without the special bonus<br />

rule, computed on the basis <strong>of</strong> Portfolio A.<br />

Level l PrL = l r l = EL = l r l = EL = l EL = l<br />

without a priori with a priori<br />

ratemaking<br />

ratemaking<br />

22 54 % 3060 % 2715% 184%<br />

21 38 % 2739 % 2475% 177%<br />

20 29 % 2487 % 2291% 172%<br />

19 23 % 2283 % 2141% 168%<br />

18 19 % 2112 % 2014% 165%<br />

17 16 % 1965 % 1903% 162%<br />

16 15 % 1838 % 1803% 160%<br />

15 13 % 1726 % 1714% 158%<br />

14 13 % 1625 % 1631% 156%<br />

13 12 % 1528 % 1550% 155%<br />

12 12 % 1439 % 1472% 153%<br />

11 12 % 1359 % 1402% 152%<br />

10 12 % 1289 % 1340% 151%<br />

9 14 % 1198 % 1255% 149%<br />

8 16 % 1111 % 1173% 148%<br />

7 17 % 1048 % 1114% 147%<br />

6 18% 998 % 1067% 146%<br />

5 18% 956 % 1028% 146%<br />

4 43% 752% 826% 143%<br />

3 38% 727% 802% 143%<br />

2 34% 703% 778% 142%<br />

1 31% 681% 755% 142%<br />

0 503% 376% 451% 138%<br />

Belgian system are not severe enough in comparison with the average claims frequency. And<br />

the situation is even more serious when looking at the actual figures on the market. Indeed<br />

the market average claims frequency is even smaller than the one <strong>of</strong> the analysed portfolio.<br />

If we compute the relativities without taking into account the a priori ratemaking system,<br />

the relativities vary between 376 % for level 0 and 3060 % for level 22. On the other hand,<br />

if we adapt the relativities to the a priori risk classification, these relativities vary between<br />

451 % and 2715 %; the severity <strong>of</strong> the a posteriori corrections is thus weaker in this case.<br />

Tables 4.20, 4.21 and 4.22 display the results for the Belgian bonus-malus system when the<br />

special bonus rule is taken into account. Table 4.20 compares the probability mass functions<br />

<strong>of</strong> L when the special bonus rule is taken into account, with respect to the case without<br />

special bonus rule. This rule decreases the probabilities associated with upper levels. The<br />

decrease for level 18 is even more apparent, since policyholders in that level benefit from<br />

the special bonus rule.<br />

Table 4.21 gives the relativities computed with and without the special bonus rule. We<br />

see that the relativities are larger with the rule, since it boils down to s<strong>of</strong>tening the penalty<br />

in the case where claims are filed to the insurance company.


204 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.20 Distribution <strong>of</strong> L for the Belgian bonusmalus<br />

system, computed on the basis <strong>of</strong> Portfolio A.<br />

Level l<br />

PrL = l<br />

Without special<br />

bonus rule<br />

With special<br />

bonus rule<br />

22 54% 43%<br />

21 38% 30%<br />

20 29% 22%<br />

19 23% 17%<br />

18 19% 09%<br />

17 16% 10%<br />

16 15% 10%<br />

15 13% 10%<br />

14 13% 21%<br />

13 12% 19%<br />

12 12% 18%<br />

11 12% 17%<br />

10 12% 16%<br />

9 14% 17%<br />

8 16% 19%<br />

7 17% 20%<br />

6 18% 20%<br />

5 18% 20%<br />

4 43% 45%<br />

3 38% 40%<br />

2 34% 36%<br />

1 31% 32%<br />

0 503% 509%<br />

We observe in Table 4.22 that the average a priori expected claim frequency EL = l<br />

in level l is always higher with the special bonus rule than without that rule. The effect<br />

is more pronounced in the highest levels <strong>of</strong> the scale and less pronounced in the lowest<br />

levels <strong>of</strong> the scale. This fact is obvious from the definition <strong>of</strong> the special bonus rule. The<br />

policyholders attaining the highest classes <strong>of</strong> the scale benefit from the special bonus rule.<br />

Those staying in these highest classes show therefore a higher expected frequency. Even<br />

below level 14 the effect remains true because the policyholders have benefitted from it<br />

before attaining the lowest levels. Obviously the effect is less and less pronounced at the<br />

bottom <strong>of</strong> the scale.<br />

Some insurance companies use the bonus-malus scale as an underwriting tool. For instance,<br />

they systematically refuse drivers with a bonus-malus level > 14. Our calculations show that<br />

this is unreasonable because drivers at level 15 are on average less risky than drivers at level<br />

14.<br />

We see from Table 4.19 that without the special bonus rule, the relativities are always<br />

increasing from level 0 to level 22. The same increasing pattern is observed for L = l.<br />

When looking at the results for the bonus-malus system with special bonus rule (Table 4.21),<br />

we observe that the relativities at levels 13–16 are not ordered any more. This can be


Bonus-Malus Scales 205<br />

Table 4.21 Relativities r l = EL = l for the<br />

Belgian bonus-malus system computed on the basis <strong>of</strong><br />

Portfolio A.<br />

Level l<br />

Relativities<br />

Without special With special<br />

bonus rule<br />

bonus rule<br />

22 2715 % 2843%<br />

21 2475 % 2589%<br />

20 2291 % 2389%<br />

19 2141 % 2225%<br />

18 2014 % 1994%<br />

17 1903 % 1942%<br />

16 1803 % 1869%<br />

15 1714 % 1791%<br />

14 1631 % 1924%<br />

13 1550 % 1798%<br />

12 1472 % 1684%<br />

11 1402 % 1583%<br />

10 1340 % 1495%<br />

9 1255 % 1386%<br />

8 1173 % 1283%<br />

7 1114 % 1207%<br />

6 1067 % 1147%<br />

5 1028 % 1096%<br />

4 826% 870%<br />

3 802% 841%<br />

2 778% 813%<br />

1 755% 786%<br />

0 451% 462%<br />

explained by the fact that many drivers at level 14 have benefitted from the special bonus<br />

rule. It is clear that such a situation is not acceptable from a commercial point <strong>of</strong> view. We<br />

may constrain the scale to be linear in the spirit <strong>of</strong> Gilde & Sundt (1989). However we<br />

propose a local adjustment to the scale in order to keep − r L 2 as small as possible.<br />

Let us constrain the scale to be linear between levels 13 and 16. We are looking for<br />

updated values for r ′ j , j = 1316. They are such that r ′ j = r ′ j−1<br />

+a, j = 14 15 16 where<br />

a = r ′ 16 − r ′ 13/3. We also want to keep the financial equilibrium <strong>of</strong> the system. Therefore<br />

we constrain a local equilibrium :<br />

∑16<br />

j=13<br />

r j j =<br />

∑16<br />

j=13<br />

r ′ j j<br />

Choosing r ′ 13 = 1798 %, we obtain r ′ 16 = 1937%.


206 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.22 Average a priori claim frequency<br />

EL = l in level l for the Belgian bonus-malus<br />

system, computed on the basis <strong>of</strong> Portfolio A.<br />

Level l<br />

EL = l<br />

Without special With special<br />

bonus rule<br />

bonus rule<br />

22 184% 187%<br />

21 177% 180%<br />

20 172% 174%<br />

19 168% 170%<br />

18 165% 164%<br />

17 162% 163%<br />

16 160% 161%<br />

15 158% 160%<br />

14 156% 163%<br />

13 155% 160%<br />

12 153% 158%<br />

11 152% 156%<br />

10 151% 154%<br />

9 149% 152%<br />

8 148% 150%<br />

7 147% 149%<br />

6 146% 148%<br />

5 146% 147%<br />

4 143% 144%<br />

3 143% 143%<br />

2 142% 143%<br />

1 142% 142%<br />

0 138% 138%<br />

Now let us compare the value <strong>of</strong> the expected error Q = [ − r L 2] with the original<br />

model, Q 1 , and with the constrained model, Q 2 :weget<br />

Q 1 = 041121 and Q 2 = 041150<br />

This shows that the error induced by the commercial constraint is really small. So we may<br />

adapt the scale locally without resorting to a full linear scale constraint.<br />

We can perform the local minimization numerically without imposing a linear scale<br />

between levels 13 and 16. We use the following constraints :<br />

r ′ 13 ≤ r ′ 14 <br />

r ′ 14 ≤ r ′ 15 <br />

r ′ 15 ≤ r ′ 16


Bonus-Malus Scales 207<br />

∑16<br />

j=13<br />

r j j =<br />

∑16<br />

j=13<br />

r ′ j ≥ 169 %<br />

r ′ j ≤ 194 %<br />

r ′ j j<br />

j = 1316<br />

j = 1316<br />

And we obtain r ′ 13 = 1798 % and r ′ 14 = r ′ 15 = r ′ 16<br />

= 1878 %. The value <strong>of</strong> Q is now 041135.<br />

4.7.5 Linear Relativities for the Belgian Scale<br />

We shall now demonstrate how to get linear relativities for the Belgian scale. In addition to<br />

the constraint <strong>of</strong> building a linear scale, we add a new constraint: the artificial states must<br />

have the same relativity. The minimization problem then becomes :<br />

[ ] ]<br />

min E − r lin<br />

L 2 = min E<br />

[ − − L 2<br />

Table 4.23 Relativities r l and rl<br />

lin for the Belgian<br />

bonus-malus system, taking into account the special<br />

bonus rule and computed on the basis <strong>of</strong> Portfolio A.<br />

Level l<br />

Relativities<br />

General scale Linear scale<br />

22 2843 % 2674%<br />

21 2589 % 2574%<br />

20 2389 % 2475%<br />

19 2225 % 2376%<br />

18 1994 % 2276%<br />

17 1942 % 2177%<br />

16 1869 % 2077%<br />

15 1791 % 1978%<br />

14 1924 % 1879%<br />

13 1798 % 1779%<br />

12 1684 % 1680%<br />

11 1583 % 1580%<br />

10 1495 % 1481%<br />

9 1386 % 1382%<br />

8 1283 % 1282%<br />

7 1207 % 1183%<br />

6 1147 % 1083%<br />

5 1096% 984%<br />

4 870% 885%<br />

3 841% 785%<br />

2 813% 686%<br />

1 786% 586%<br />

0 462% 487%


208 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

so that<br />

r l = r l1 = r l2 =···=r lnl <br />

l= 0 1s<br />

n<br />

∑ l<br />

The solution to this problem is the same as above. We merely have to replace l by li .<br />

The results are displayed in Table 4.23. The constant step between two levels <strong>of</strong> the linear<br />

scale is equal to 99 % and the value <strong>of</strong> the mean square error is Q = 041779. The major<br />

advantage <strong>of</strong> the linear scale is that it gives a system that is commercially more acceptable.<br />

i=1<br />

4.8 Change <strong>of</strong> Scale<br />

4.8.1 Migration from One Scale to Another<br />

Since the 90s insurance markets in the EU have been deregulated. More competition is<br />

allowed. Two related problems arise with the deregulation <strong>of</strong> the bonus-malus systems. The<br />

first one consists <strong>of</strong> transferring the policyholders to the new scales. The second one is more<br />

difficult: it consists <strong>of</strong> transferring a new policyholder to the scale <strong>of</strong> the company knowing<br />

his level in the scale <strong>of</strong> his previous insurer.<br />

The aim <strong>of</strong> the present section is to show how to develop rules allowing the transfer<br />

<strong>of</strong> a policyholder to a bonus-malus scale knowing his level in his previous bonus-malus<br />

scale. The a posteriori probability density function <strong>of</strong> given the level L occupied in the<br />

bonus-malus scale is given by<br />

f L=l = PrL = l = f <br />

PrL = l<br />

∑<br />

k <br />

=<br />

k l k dF <br />

∫ ∑k k 0 l k dF <br />

Because we want to move a policyholder from one scale to the other, we should try to put<br />

the policyholder at a level which is as close as possible to his level in his original bonusmalus<br />

scale. By ‘close’, we mean here having the a posteriori random effect as close as<br />

possible.<br />

4.8.2 Kolmogorov Distance<br />

In addition to the total variation distance d TV used previously, we also need the Kolmogorov<br />

distance. The Kolmogorov (or uniform) metric based on the well-known Kolmogorov-<br />

Smirnov statistic (associated with the goodness-<strong>of</strong>-fit test with that name), is defined<br />

as follows: The Kolmogorov distance d K between the random variables X and Y is<br />

given by<br />

d K X Y = sup F X t − F Y t (4.24)<br />

t∈<br />

Given two random variables X and Y , we have that d K X Y ≤ d TV X Y. This result is<br />

an immediate consequence <strong>of</strong> (4.11) since F X t = PrX ∈ t +.


Bonus-Malus Scales 209<br />

4.8.3 Distances between the Random Effects<br />

A first measure may be to compare the expected a posteriori random effect, which actually<br />

is the relative premium at level l in the bonus-malus scale :<br />

r L=l = L = l<br />

So the closest level l j in scale 2 to the level l i occupied in scale 1 is given by<br />

argmin lj<br />

r L1 =l i<br />

− r L2 =l j<br />

2 <br />

This rule simply amounts to placing the policyholder in the new scale at the level with<br />

the closest relativity to the one applicable in the previous scale. Because most commercial<br />

scales are normalized (to associate a unit relativity with the entry level), this means that the<br />

insurer has to compute the relativities for both scales. The implicit assumption is that the<br />

new entrant has the same characteristics as the policyholders in the portfolio (no adverse<br />

selection being allowed).<br />

Another measure <strong>of</strong> discrepancy consists <strong>of</strong> comparing the distribution functions <strong>of</strong> the a<br />

posteriori random effects. This can be done by using the Kolmogorov distance d K or the<br />

total variation distance d TV :<br />

d K L 1 = l i L 2 = l j = max Pr ≤ L 1 = l i − Pr ≤ L 2 = l j <br />

<br />

d TV L 1 = l i L 2 = l j =<br />

∫ <br />

0<br />

f L1 =l i<br />

− f L2 =l j<br />

d<br />

Summarizing, a policyholder being at level l i in scale 1 and moving to scale 2 will be put<br />

in the level l j <strong>of</strong> scale 2 that minimizes one <strong>of</strong> the following distances:<br />

(i) d E L 1 = l i L 2 = l j = r L1 =l i<br />

− r L2 =l j<br />

2<br />

(ii) d K L 1 = l i L 2 = l j <br />

(iii) d TV L 1 = l i L 2 = l j .<br />

4.8.4 Numerical Illustration<br />

In this section we use Portfolio A and its risk classification described in Table 2.7. Let us<br />

assume that we want to move policyholders between the following two bonus-malus scales:<br />

• The −1/top scale with 6 levels, numbered 0 to 5.<br />

• The −1/ + 2 scale <strong>of</strong> Taylor (1997) with 9 levels, having transition rules given in<br />

Table 4.24.


210 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 4.24 Transition rules for the scale −1/ + 2 scale <strong>of</strong><br />

Taylor (1997) with 9 levels.<br />

Starting level<br />

Level occupied if<br />

0 1 2 3 ≥ 4<br />

claim is/are reported<br />

8 7 8 8 8 8<br />

7 6 8 8 8 8<br />

6 5 8 8 8 8<br />

5 4 7 8 8 8<br />

4 3 6 8 8 8<br />

3 2 5 7 8 8<br />

2 1 4 6 8 8<br />

1 0 3 5 7 8<br />

0 0 2 4 6 8<br />

Table 4.25<br />

Distances between L 1 = l 1 and L 2 = l 2 for the three metrics.<br />

d K<br />

l 2 l 1 0 1 2 3 4 5 6 7 8<br />

0 0042 0377 0404 0574 0613 0701 0740 0790 0825<br />

1 0323 0016 0044 0245 0299 0421 0483 0562 0625<br />

2 0357 0061 0036 0200 0254 0376 0439 0520 0585<br />

3 0396 0110 0080 0153 0204 0325 0388 0471 0539<br />

4 0439 0168 0137 0105 0151 0267 0330 0414 0483<br />

5 0488 0237 0207 0061 0097 0202 0262 0344 0414<br />

d TV<br />

l 2 l 1 0 1 2 3 4 5 6 7 8<br />

0 0089 0754 0808 1148 1225 1402 1481 1580 1651<br />

1 0646 0081 0107 0490 0599 0841 0966 1125 1249<br />

2 0715 0123 0094 0402 0508 0751 0878 1041 1172<br />

3 0791 0220 0160 0320 0411 0649 0777 0943 1078<br />

4 0877 0337 0274 0270 0324 0535 0660 0827 0966<br />

5 0976 0475 0413 0281 0286 0426 0527 0688 0834<br />

d E<br />

l 2 l 1 0 1 2 3 4 5 6 7 8<br />

0 0001 0308 0391 1077 1415 2329 3093 4349 5908<br />

1 0314 0002 0001 0194 0351 0863 1350 2215 3362<br />

2 0440 0021 0006 0114 0239 0682 1120 1917 2993<br />

3 0624 0074 0041 0044 0132 0489 0868 1584 2572<br />

4 0902 0186 0131 0003 0041 0291 0596 1207 2085<br />

5 1377 0430 0343 0030 0000 0100 0301 0765 1489


Bonus-Malus Scales 211<br />

Table 4.25 provides the distances between the conditional random effect for our three<br />

metrics. On the basis <strong>of</strong> these distances, we see that the minima are attained for the following<br />

transition rules. Transition rules from scale 1 to scale 2 are<br />

d K d TV d <br />

l 1 l 2 l 2 l 2<br />

0 0 0 0<br />

1 1 1 2<br />

2 2 2 2<br />

3 2 2 2<br />

4 3 3 3<br />

5 3 3 4<br />

and transition rules from scale 2 to scale 1 are<br />

d K d TV d <br />

l 2 l 1 l 1 l 1<br />

0 0 0 0<br />

1 1 1 1<br />

2 2 2 1<br />

3 5 4 4<br />

4 5 5 5<br />

5 5 5 5<br />

6 5 5 5<br />

7 5 5 5<br />

8 5 5 5<br />

Let us now take into account the a priori characteristics <strong>of</strong> the driver. It will be seen that<br />

good and bad drivers are not placed in the same way when they are transferred from one<br />

scale to the other. The transition rules for an a priori bad driver (specifically, a male driver<br />

aged more than 30 years, with premium split, having private use <strong>of</strong> the car and living in<br />

an urban environment, whose a priori expected frequency is 0.1794) are given in the next<br />

tables: from scale 1 to scale 2<br />

d K d TV d E<br />

l 1 l 2 l 2 l 2<br />

0 0 0 0<br />

1 2 1 2<br />

2 2 2 2<br />

3 2 2 3<br />

4 3 3 3<br />

5 3 4 4


212 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

and from scale 2 to scale 1<br />

d K d TV d E<br />

l 2 l 1 l 1 l 1<br />

0 0 0 0<br />

1 1 1 1<br />

2 1 1 1<br />

3 5 4 4<br />

4 5 4 4<br />

5 5 5 5<br />

6 5 5 5<br />

7 5 5 5<br />

8 5 5 5<br />

We here compare the distance between L 1 = l i = 01794 and L 2 = l j = 01794.<br />

We have<br />

<br />

f L=l= = l f <br />

∫ <br />

0 lf d <br />

The transition rules for a good policyholder (specifically, a male driver aged more than 30<br />

years, with upfront premium, having private use <strong>of</strong> the car and living in a rural environment,<br />

whose a priori expected frequency is 0.0928) are given in the next tables: from scale 1 to<br />

scale 2<br />

and from scale 2 to scale 1<br />

d K d TV d E<br />

l 1 l 2 l 2 l 2<br />

0 0 0 0<br />

1 1 1 1<br />

2 1 1 2<br />

3 2 2 2<br />

4 2 2 2<br />

5 2 2 3<br />

d K d TV d E<br />

l 2 l 1 l 1 l 1<br />

0 0 0 0<br />

1 2 2 1<br />

2 2 3 2<br />

3 5 5 5<br />

4 5 5 5<br />

5 5 5 5<br />

6 5 5 5<br />

7 5 5 5<br />

8 5 5 5<br />

We here compare the distance between L 1 = l i = 00928 and L 2 = l j = 00928.


Bonus-Malus Scales 213<br />

We observe that the different metrics we have chosen do not provide very different results.<br />

Because the expected posterior random effect has a financial meaning (i.e. it is the multiplier<br />

<strong>of</strong> the average cost to get the a posteriori premium), we may be tempted to choose its<br />

corresponding metric to transfer a policyholder from one scale to the other. Note also that<br />

the a priori characteristics <strong>of</strong> the driver influence the way the policy is transferred from<br />

one scale to the other. This results in a number <strong>of</strong> rules according to the risk classification<br />

scheme applied by the insurance company.<br />

Remark 4.5 Transferring a policyholder from the bonus-malus scale <strong>of</strong> a given insurer<br />

to the bonus-malus scale <strong>of</strong> another insurer remains a more complicated task. Indeed, the<br />

actuary needs to know the a posteriori random effect in both situations. However the a<br />

priori random effect may be different due to another type <strong>of</strong> a priori tariff or due to adverse<br />

selection. The distance to minimize then extends to d 1 L 1 = l 1 2 L 2 = l 2 .<br />

4.9 Dependence in Bonus-Malus Scales<br />

It is clear that the sequence L 1 L 2 is CIS, but we do not have in general that given<br />

L = l increases with l. The reason is as follows: Because <strong>of</strong> the finite number <strong>of</strong> levels,<br />

it does not automatically follow that a policyholder in a higher level filed more claims in<br />

the past. To show this, consider the scale −1/top. A policyholder in level 3 in year 3 may<br />

occupy that level having filed two claims in year 1 and no claim after. On the contrary, a<br />

policyholder in level 4 could have filed one claim in year 2. From (3.18) we conclude that<br />

should be larger (in the ≼ ST -sense) for policyholder 1 than for policyholder 2 despite the<br />

fact that policyholder 1 is in an inferior level.<br />

As a consequence, we cannot be sure that the Bayesian relativities obtained with a quadratic<br />

loss function are increasing with the level occupied in the scale. This is why linear relativities<br />

are so useful in practice.<br />

Remark 4.6 This counterintuitive fact becomes reasonable if we allow for random effects<br />

to vary in time. Provided the autocorrelogram is decreasing, old claims have less predictive<br />

power than recent ones. Then, we have to compare two old claims to one recent claim, and<br />

it becomes less obvious that policyholder 1 is more dangerous than policyholder 2.<br />

4.10 Further Reading and Bibliographic Notes<br />

Chapter 7 in Rolski ET AL. (1999) <strong>of</strong>fers an excellent introduction to Markov chains, with<br />

applications to bonus-malus systems. Several parts <strong>of</strong> this chapter are directly inspired<br />

from this source. For the most part, this chapter is based on Pitrebois, Denuit &<br />

Walhin (2003b), following on from Taylor (1997) for the extension <strong>of</strong> Norberg’s (1976)<br />

pioneering work on segmented tariffs; on Pitrebois, Denuit & Walhin (2004) for the<br />

linear relativities; on Pitrebois, Walhin & Denuit (2006c) for Section 4.8; and on<br />

Pitrebois, Denuit & Walhin (2003a) for the Belgian bonus-malus scale and its special<br />

bonus rule.<br />

Norberg (1976), Borgan, Hoem & Norberg (1981) and Gilde & Sundt (1989)<br />

assumed that the bonus-malus system forms a first order Markov chain. Centeno &<br />

Andrade e Silva (2002) considered bonus-malus systems that are not first order Markovian


214 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

processes, but that can be made Markovian by increasing the number <strong>of</strong> states as we did in<br />

Section 4.7.<br />

The notion <strong>of</strong> distance found a second life in probability in the form <strong>of</strong> metrics in<br />

spaces <strong>of</strong> random variables and their probability distributions. The study <strong>of</strong> limit theorems<br />

(among other questions) made it necessary to introduce functionals evaluating the nearness<br />

<strong>of</strong> probability distributions in some probabilistic sense. In this chapter, the total variance<br />

and Kolmogorov distances have been used in connection with bonus-malus systems. We<br />

refer the interested reader to Chapter 9 <strong>of</strong> Denuit ET AL. (2005) for a detailed account <strong>of</strong><br />

probability metrics and their applications in risk theory.<br />

The premium relativities are traditionally computed with the help <strong>of</strong> a quadratic loss<br />

function, in the vein <strong>of</strong> Norberg (1976). Other loss functions nevertheless also deserve<br />

consideration, such as the exponential loss function applied by Denuit & Dhaene (2001)<br />

to the computation <strong>of</strong> the relativities. For the sake <strong>of</strong> completeness, let us mention that<br />

the absolute value loss function has been successfully applied to the determination <strong>of</strong> the<br />

relativities by Heras, Vilar & Gil (2002) and Heras, Gil, Garcia-Pineda & Vilar (2004).<br />

If in a given market, companies start to compete on the basis <strong>of</strong> bonus-malus systems,<br />

many policyholders could leave the portfolio after the occurrence <strong>of</strong> an accident, in order to<br />

avoid the resulting penalties. Those attritions can be incorporated in the model by adding a<br />

supplementary level to the Markov chain (in the spirit <strong>of</strong> Centeno & Andrade e Silva<br />

(2001)). Transitions from a level <strong>of</strong> the bonus-malus scale to this state represent a policyholder<br />

leaving the portfolio whereas transitions from this state to any level <strong>of</strong> the bonus-malus scale<br />

mean that a new policy enters the portfolio.<br />

It has been assumed throughout this chapter that the unknown expected claim frequencies<br />

were constant and that the random effects representing hidden characteristics were timeinvariant.<br />

Dropping these assumptions makes the determination <strong>of</strong> the relativities much<br />

harder. We refer the interested reader to Brouhns, Guillén, Denuit & Pinquet (2003)<br />

for a thorough study <strong>of</strong> this general situation. A fundamental difference with the traditional<br />

approaches is that we lose the homogeneity <strong>of</strong> the chain in a dynamic segmented environment.<br />

Indeed, if the observable characteristics <strong>of</strong> the policyholders are allowed to vary in time, the<br />

claim frequencies are no longer constant and the trajectory <strong>of</strong> the policyholder in the bonusmalus<br />

scale is no longer described by a homogeneous Markov chain, but well described<br />

by a non-homogeneous one. Consequently, the classical techniques based on stationary<br />

distributions cannot be applied to the problem <strong>of</strong> determining the relativities. Brouhns,<br />

Guillén, Denuit & Pinquet (2003) propose a computer-intensive method to calibrate<br />

bonus-malus scales. Their paper clearly illustrates the strong complementarity <strong>of</strong> a priori<br />

and a posteriori ratemakings. The main originality <strong>of</strong> their approach is to compare on the<br />

basis <strong>of</strong> real data four different credibility models: static versus dynamic heterogeneity, with<br />

and without recognizing a priori risk classification. The impact <strong>of</strong> the different assumptions<br />

becomes clear in the numerical illustrations.<br />

Andrade e Silva & Centeno (2005) suggested the use <strong>of</strong> geometric relativities instead<br />

<strong>of</strong> linear ones. Specifically, under a quadratic loss function, the relativity associated with<br />

level l becomes r geo<br />

l<br />

= l , with and positive. As pointed out in Remark 4.3 for linear<br />

relativities, finding the geometric relativities amounts to finding the best approximation l<br />

to the Bayesian relativities, that is, and solve<br />

min E[ (EL<br />

− <br />

L ) 2 ]


Bonus-Malus Scales 215<br />

No explicit expressions are available for the optimal and , which must be determined<br />

numerically.<br />

As pointed out by Subramanian (1998), the market shares <strong>of</strong> competitors in a given<br />

market can be strongly affected when some insurers adopt an agressive competitive behaviour<br />

by modifying the bonus-malus systems. In a deregulated market, insurers have an incentive<br />

to innovate in their pricing decisions by partitioning their portfolios (a priori ratemaking)<br />

and by designing new bonus-malus systems (a posteriori ratemaking). Viswanathan &<br />

Lemaire (2005) examined the evolution <strong>of</strong> market shares and claim frequencies in a twocompany<br />

market, when one insurer breaks <strong>of</strong>f the existing stability by introducing a super<br />

discount class in its bonus-malus system.<br />

To end with, let us point out a final remark <strong>of</strong> primary importance. Merit-rating structures<br />

in automobile insurance require the insured to decide whether to file a claim for an accident<br />

when he is at fault. Since the penalties are independent <strong>of</strong> the claim amounts, one could<br />

imagine that some policyholders prefer to carry the cost <strong>of</strong> the accident themselves in order<br />

to avoid the premium increase in the future. Therefore, the data the actuary has at his disposal<br />

are contingent on the actual bonus-malus system and are ‘censored’ in a very complicated<br />

way. Hence, the policyholder might modify their behaviour when a new bonus-malus system<br />

is introduced, resulting in an increasing (or decreasing) number <strong>of</strong> reported claims. This is<br />

a particularly important side-effect when a bonus-malus system is modified. We will come<br />

back to this issue in Chapter 5.<br />

The numerical results presented in this chapter can be obtained using s<strong>of</strong>tware such as<br />

SAS R or R. There is also a specific s<strong>of</strong>tware package, called BM-builder, which works in<br />

the SAS R environment and allows for actuarial computations related to bonus-malus scales.<br />

It has been developed by ReacFin SA, which is a spin-<strong>of</strong>f <strong>of</strong> the Université Catholique de<br />

Louvain, Louvain-la-Neuve, Belgium, created in January 2004 by the authors. Reacfin’s aim<br />

is to provide actuarial solutions to its clients. Its strong link with the university guarantees<br />

the use <strong>of</strong> up-to-date techniques. BM-builder aims to compute the relativities attached to<br />

the different levels <strong>of</strong> a bonus-malus scale, taking into account the actual structure <strong>of</strong> the<br />

insurance portfolio. In that respect, it properly integrates the interactions between a priori<br />

and a posteriori ratemakings and allows for an efficient ratemaking. For more details, see<br />

http://www.reacfin.com.


Part III<br />

Advances in<br />

Experience Rating<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


5<br />

Efficiency and Bonus Hunger<br />

5.1 Introduction<br />

5.1.1 Pure Premium<br />

In nonlife business, the pure premium is the expected cost <strong>of</strong> all the claims that the<br />

policyholder will file during the coverage period (under the assumption <strong>of</strong> the Law <strong>of</strong> Large<br />

Numbers, i.e. a large portfolio comprising independent and identically distributed risks).<br />

The actuarial ratemaking is based on a claim frequency distribution and a loss distribution.<br />

The average claim frequency is defined as the number <strong>of</strong> incurred claims per unit <strong>of</strong> earned<br />

exposure (the exposure is usually measured in car-year for motor insurance). The average<br />

loss severity is the average payment per incurred claim.<br />

In this chapter, as well as in the next one, we need to consider the claim severities. Even if<br />

the premium updates induced by bonus-malus systems only depend on the number <strong>of</strong> claims<br />

at fault filed with the insurance company, the design <strong>of</strong> efficiency measures, as well as the<br />

study <strong>of</strong> the bonus-hunger phenomenon (i.e. the tendency for the policyholder to self-defray<br />

minor accidents to avoid premium surcharges), require an accurate modelling for the cost <strong>of</strong><br />

the claims. This topic is dealt with in Section 5.2.<br />

5.1.2 Statistical Analysis <strong>of</strong> <strong>Claim</strong> Costs<br />

The computation <strong>of</strong> the pure premium relies on a statistical model incorporating all the<br />

available information about the risk. The technical tariff aims to evaluate as accurately as<br />

possible the pure premium for each policyholder via regression techniques. It is well known<br />

that market premiums may differ from those computed by actuaries. In that respect, the<br />

overall market position <strong>of</strong> the company compared to its competitors with regard to growth<br />

and pricing is crucial. This chapter is devoted to technical tariff only.<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


220 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The first step <strong>of</strong> any actuarial analysis consists in displaying descriptive statistics in order<br />

to figure out the composition <strong>of</strong> the portfolio and the marginal impact <strong>of</strong> the rating factors.<br />

In a second stage, available explanatory variables are incorporated to the policyholders’<br />

expected claim frequencies and severities with the help <strong>of</strong> generalized regression models.<br />

5.1.3 Large <strong>Claim</strong>s and Extreme Value Theory<br />

Large claims generally affect liability coverages. These major accidents require a separate<br />

analysis. The reason for a separate analysis <strong>of</strong> small (or moderate) and large losses is that<br />

no standard parametric model seems to emerge as providing an acceptable fit to both small<br />

and large claims. The main goal is then to determine an optimal threshold separating the<br />

two types <strong>of</strong> losses.<br />

Extreme Value Theory and Generalized Pareto distributions can be used to set the value<br />

<strong>of</strong> this threshold. Specifically, graphical tools including the Pareto index plot and the<br />

Gertensgarbe plot can be used to estimate the threshold defining the large losses. In the<br />

former case, the maximum likelihood estimator <strong>of</strong> the Pareto tail parameter is computed<br />

for increasing thresholds until it becomes approximately constant. The Gertensgarbe plot<br />

is based on the assumption that the optimal threshold can be found as a change point in<br />

the ordered series <strong>of</strong> claim costs and that the change point can be identified by means <strong>of</strong><br />

a sequential version <strong>of</strong> the Mann-Kendall test as the intersection point between normalized<br />

progressive and retrograde rank statistics.<br />

5.1.4 Measuring the Efficiency <strong>of</strong> the Bonus-Malus Scales<br />

As explained in the preceding chapters, the basis <strong>of</strong> fair ratemaking in motor insurance is<br />

the fact that each policyholder is charged a premium that is proportional to the risk that<br />

he actually represents. The accident proneness <strong>of</strong> a policyholder being represented by the<br />

relative risk parameter , we expect that a relative change in will have the same impact<br />

on the premium paid to the insurance company. If this is the case then the system is said to<br />

be fully efficient.<br />

Section 5.3 reviews two concepts <strong>of</strong> efficiency: Loimaranta efficiency and De Pril<br />

efficiency. Both intend to measure how the bonus-malus system responds to a change in the<br />

riskiness <strong>of</strong> the driver. Loimaranta efficiency is solely based on the stationary probabilities<br />

whereas De Pril efficiency is a transient concept and uses the time value <strong>of</strong> money (through<br />

discounting).<br />

5.1.5 Bonus Hunger and Optimal Retention<br />

Since the penalty induced by the bonus-malus system is independent <strong>of</strong> the claim amount,<br />

a crucial issue for the policyholder is therefore to decide whether it is pr<strong>of</strong>itable or not to<br />

report small claims (in order to avoid an increase in premium). Cheap claims are likely to<br />

be defrayed by the policyholders themselves, and not to be reported to the company. This<br />

phenomenon, known as the hunger for bonus after Philipson (1960), is studied in Section<br />

5.4.<br />

Section 5.4.1 is devoted to the censorship <strong>of</strong> claim amounts and claim frequencies arising<br />

from bonus-malus systems. Specifically, a statistical model is specified, that takes into


Efficiency and Bonus Hunger 221<br />

account the fact that only ‘expensive’ claims are reported to the insurance company. We<br />

will consider that each policyholder has his own unknown retention limit, depending on the<br />

level occupied inside the bonus-malus scale as well as on observable characteristics (like<br />

age or gender, for instance). The policyholder reports the accident to the company only if its<br />

cost exceeds the retention limit. A regression model accounting for the fact that we observe<br />

the maximum between the accident cost and the retention limit is then fitted to the observed<br />

claim data. We then recover probability models for the accident costs (whereas formerly,<br />

we modelled claim costs). The claim frequencies can also be corrected in order to obtain<br />

accident frequencies (this is done in Section 5.4.2).<br />

Section 5.4.3 examines the optimal claiming strategy, that should be followed by rational<br />

policyholders. A strategy for each policyholder can be defined by a vector rl 0 rl s T<br />

where rl l is the retention limit for the policyholder occupying level l in the bonus-malus<br />

scale. This means that the cost <strong>of</strong> any accident <strong>of</strong> amount less than rl l is borne by the<br />

policyholder in level l. The claims causing higher costs are reported to the insurer. The<br />

problem is to determine optimal values for the rl l s. This can be done using the Lemaire<br />

algorithm. The optimal retention limits depend on the level occupied in the scale, on the<br />

annual expected claim frequency as well as on a discount rate. Note that the Lemaire<br />

algorithm gives the optimal retention limit obtained by means <strong>of</strong> dynamic programming.<br />

The resulting strategy should be adopted by rational policyholders, but may differ from the<br />

one empirically observed in insurance portfolios. The optimal retentions obtained from the<br />

Lemaire algorithm can also be seen as a measure <strong>of</strong> the toughness <strong>of</strong> the bonus-malus system.<br />

A system that induces large rl l s is more severe than another one yielding moderate retention<br />

limits. As such, the rl l s can also be used to measure the efficiency <strong>of</strong> the bonus-malus system.<br />

The claim costs play an important role in this chapter. As explained above, modelling<br />

claim sizes is a difficult issue because <strong>of</strong> the strong heterogeneity and <strong>of</strong> the presence <strong>of</strong><br />

large claims. Nevertheless, it seems reasonable to agree that large claims will always be<br />

reported to the company, so that only moderate claims are subject to bonus hunger.<br />

In this chapter, we work within a given bonus-malus scale, whose levels are numbered<br />

from 0 to s, with fixed relativities r 0 r s (the r l s have been computed as explained in<br />

Chapter 4, or have been derived from marketing considerations, but are treated as given for<br />

the whole chapter).<br />

5.1.6 Descriptive Statistics for Portfolio C<br />

The numerical illustrations <strong>of</strong> this chapter are based on the observation <strong>of</strong> a Belgian motor<br />

third party liability insurance portfolio during the year 1997. This portfolio, henceforth<br />

referred to as Portfolio C, comprised 163 660 policies.<br />

The following variables are available for portfolio C: As far as policyholders’<br />

characteristics are concerned, we know the Gender (male or female), the age (variable Ageph,<br />

four classes: 18–24, 25–30, 31–60 and > 60), the place <strong>of</strong> residence (variable City, rural or<br />

urban) and the Use <strong>of</strong> the car (private or pr<strong>of</strong>essional). Concerning the insured vehicle, we<br />

know its age (variable Agev, four classes: 1–2, 3–5, 5–10 and > 10 years), the type <strong>of</strong> Fuel<br />

(petrol or gasoil) and its Power (three classes: < 66 kW, 66–110 kW and > 110kW). About<br />

the type <strong>of</strong> contract, we know whether the premium payment has been split up (variable<br />

Premium split, with payment once a year, or more than once a year) and the type <strong>of</strong> Coverage<br />

(motor third party liability only, or motor third party liability together with some more


222 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 5.1 Descriptive statistics <strong>of</strong> the claim costs (only strictly<br />

positive values) for portfolio C.<br />

Statistic<br />

Value<br />

# observations 18 176<br />

Minimum 27.02<br />

Maximum 1 989 567.9<br />

Mean 1810.63<br />

Standard deviation 17 577.83<br />

25th percentile 145.02<br />

Median 598.17<br />

75th percentile 1464.75<br />

90th percentile 3021.87<br />

95th percentile 4268.06<br />

99th percentile 19 893.68<br />

Skewness 85.08<br />

optional coverages). In addition to these covariates, we know the number <strong>of</strong> claims filed by<br />

each policyholder during 1997, the exposure-to-risk from which these claims originated, as<br />

well as the resulting total claim amount. The information recorded in the data base dates<br />

from the end <strong>of</strong> June 1998 (6 months after the end <strong>of</strong> the observation period). Hence, most<br />

<strong>of</strong> the ‘small’ claims are settled and their final cost is known. However, for the large claims,<br />

we work here with incurred losses (payments made plus reserve).<br />

Descriptive statistics for claim costs are displayed in Table 5.1; we have at our disposal<br />

18 176 observed individual claim costs, ranging from E27.02 to almost E2 000 000, with a<br />

mean <strong>of</strong> E1810.63. We see in Table 5.1 that 25 % <strong>of</strong> the recorded claim costs are below<br />

E145.02, that half <strong>of</strong> them are smaller than E598.17, and that 90 % <strong>of</strong> them are less than<br />

E3021.87. The interquartile range is E1319.73. The observed claim cost distribution is highly<br />

asymmetric, with a skewness coefficient <strong>of</strong> about 85.<br />

5.2 <strong>Modelling</strong> <strong>Claim</strong> Severities<br />

5.2.1 <strong>Claim</strong> Severities in Motor Third Party Liability Insurance<br />

In nonlife business, the pure premium is the expected cost <strong>of</strong> all the claims that policyholders<br />

will file during the coverage period (under the assumption <strong>of</strong> the Law <strong>of</strong> Large Numbers).<br />

Let S i , i = 1n, be the total claim amount relating to policy number i. The S i s are<br />

assumed to be independent and identically distributed with common mean . The Law <strong>of</strong><br />

Large Numbers ensures that<br />

[<br />

Pr S n = 1 n<br />

n∑<br />

i=1<br />

]<br />

S i → as n →+ = 1<br />

Under the conditions <strong>of</strong> the Law <strong>of</strong> Large Numbers, the pure premium is thus the expected<br />

claim amount.


Efficiency and Bonus Hunger 223<br />

The modelling <strong>of</strong> claim costs is much more difficult than claim frequencies. There are<br />

several reasons for this: In liability insurance, claims costs are <strong>of</strong>ten a mix <strong>of</strong> moderate<br />

and large claims. Usually, ‘large claim’ means exceeding some threshold, depending on the<br />

portfolio under study. This threshold can be selected using techniques from Extreme Value<br />

Theory, as described in Cebrian, Denuit & Lambert (2003). Large liability claims need<br />

several years to be settled. Only estimates <strong>of</strong> the final cost appear in the file until the claim<br />

is closed. Moreover, the statistics available to fit a model for claim severities are much more<br />

limited than for claim frequencies, since only 10 % <strong>of</strong> the policies in the portfolio produced<br />

claims. Finally, the cost <strong>of</strong> an accident is for the most part beyond the control <strong>of</strong> a policyholder<br />

since the payments <strong>of</strong> the insurance company are determined by third-party characteristics.<br />

The degree <strong>of</strong> care exercised by a driver mostly influences the number <strong>of</strong> accidents, but in<br />

a much lesser way the cost <strong>of</strong> these accidents. The information contained in the available<br />

observed covariates is usually much less relevant for claim sizes than for claim counts.<br />

In liability insurance, the settlement <strong>of</strong> larger claims <strong>of</strong>ten requires several years. Much <strong>of</strong><br />

the data available for the recent accident years will therefore be incomplete, in the sense that<br />

the final claim cost will not be known. In this case, loss development factors can be used to<br />

obtain a final cost estimate. The average loss severity is then based on incurred loss data.<br />

In contrast to paid loss data (which are purely objective, representing the actual payments<br />

made by the company), incurred loss data include subjective reserve estimates.<br />

The total claim amount generated by policyholder i covered for motor third party liability<br />

can be represented as<br />

where<br />

Ni<br />

small<br />

C ik<br />

N large<br />

i<br />

N small N i∑<br />

i∑<br />

large<br />

S i = C ik + L ik (5.1)<br />

is the number <strong>of</strong> standard (or small) claims filed by policyholder i<br />

is the cost <strong>of</strong> the kth standard claim filed by policyholder i<br />

is the number <strong>of</strong> large claims filed by policyholder i<br />

L ik is the cost <strong>of</strong> the kth large claim filed by policyholder i.<br />

k=1<br />

All these random variables are assumed to be mutually independent. The random variables<br />

Ni<br />

small and N large<br />

i are analysed as explained in Chapters 1–2. Here, we explain how to model<br />

the C ik s and the L ik s. The first question to be addressed is to separate standard claims and<br />

large claims.<br />

k=1<br />

5.2.2 Determining the Large <strong>Claim</strong>s with Extreme Value Theory<br />

Extreme <strong>Claim</strong> Amounts<br />

Gamma, LogNormal and Inverse Gaussian distributions (as well as other parametric models)<br />

have <strong>of</strong>ten been used by actuaries to fit claim sizes. However, when the main interest is in the<br />

tail <strong>of</strong> loss severity distributions, it is essential to have a good model for the largest claims.<br />

Distributions providing a good overall fit can be particularly bad at fitting the tails. Extreme<br />

Value Theory and Generalized Pareto distributions focus on the tails, being supported by<br />

strong theoretical arguments.


224 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

We only give hereafter a short non technical description <strong>of</strong> the fundaments <strong>of</strong> Extreme<br />

Value Theory; for more details, we refer the reader to Beirlant ET AL. (2004). Considering<br />

a sequence <strong>of</strong> independent and identically distributed random variables (claim severities,<br />

say) X 1 X 2 X 3 , most classical results from probability and statistics that are relevant<br />

for insurance are based on sums S n = ∑ n<br />

i=1 X i. Let us mention the Law <strong>of</strong> Large Numbers<br />

and the Central Limit Theorem, for instance. Another interesting yet less standard statistic<br />

for the actuary is M n = maxX 1 X n the maximum <strong>of</strong> the n claims. Extreme Value<br />

Theory mainly addresses the following question: how does M n behave in large samples (i.e.<br />

when n tends to infinity)? Of course, without further restriction, M n obviously diverges to<br />

+. Once M n is appropriately centered and normalized, however, it may converge to some<br />

specific limit distribution (<strong>of</strong> three different types, according to the fatness <strong>of</strong> the tails <strong>of</strong> the<br />

X i s). In insurance applications, heavy tailed distributions are most <strong>of</strong>ten encountered. Such<br />

distributions have survival functions that decay like a power function (in contrast to the<br />

Gamma, Inverse Gaussian or LogNormal survival functions, for instance, which all decay<br />

exponentially to zero). A prominent example <strong>of</strong> a heavy tailed distribution is the Pareto<br />

distribution, widely used by actuaries.<br />

Excess Over Threshold Approach and Generalized Pareto Distribution<br />

The traditional approach to Extreme Value Theory is based on extreme value limit<br />

distributions. Here, a model for extreme losses is based on the possible parametric form <strong>of</strong><br />

the limit distribution <strong>of</strong> maxima. A more flexible model is known as the ‘Excesses Over<br />

Threshold’ method. This approach appears as an alternative to maxima analysis for studying<br />

the extreme behaviour <strong>of</strong> some random variables. Basically, given a series X 1 X n <strong>of</strong><br />

independent and identically distributed random variables, the ‘Excesses Over Threshold’<br />

method analyses the series X i − uX i >u, i = 1n, <strong>of</strong> the exceedances <strong>of</strong> the variable<br />

over a high threshold u. Mathematical theory supports the Poisson distribution for the number<br />

<strong>of</strong> exceedances combined with independent excesses over the threshold.<br />

Let F u stand for the common cumulative distribution function <strong>of</strong> the X i − uX i >us;<br />

F u thus represents the conditional distribution <strong>of</strong> the losses, given that they exceed the<br />

threshold u. The two-parameter Generalized Pareto distribution function G · provides a<br />

good approximation to the excess distribution F u over large thresholds. This two-parameter<br />

family is defined as<br />

G x = G <br />

( x<br />

<br />

)<br />

>0<br />

where<br />

{<br />

1 − 1 + x −1/ if ≠ 0<br />

G x =<br />

1 − exp−x if = 0<br />

with x ≥ 0if ≥ 0 and x ∈ 0 −1/ if 0, the type II Pareto distribution when


Efficiency and Bonus Hunger 225<br />

For some appropriate function u and some Pareto index to be estimated from the<br />

data, the approximation<br />

F u x ≈ G u x x ≥ 0 (5.2)<br />

holds for large u. The approximation (5.2) is justified by the following formula<br />

lim<br />

sup<br />

u→+ x≥0<br />

∣<br />

∣F u x − G u x ∣ = 0 (5.3)<br />

which is true provided that F satisfies some rather general technical conditions. These<br />

conditions are verified by the heavy tailed distributions. In view <strong>of</strong> (5.2) the excesses<br />

X i − uX i >ucan be treated as a random sample from the Generalized Pareto distribution<br />

provided the threshold u is large enough.<br />

Choice <strong>of</strong> the Threshold<br />

We have seen from (5.2) that if the heavy tailed character <strong>of</strong> the data is fulfilled, a high<br />

enough threshold is selected and enough data are available above that threshold, the use <strong>of</strong><br />

Generalized Pareto distributions is justified to model large losses. The only practical problem<br />

in applying this result is how to determine what a ‘high enough threshold’ is; we deal with<br />

this problem in the present section.<br />

Two factors have to be taken into account in the choice <strong>of</strong> an optimal threshold u:<br />

• A value <strong>of</strong> u too large yields few exceedances and consequently imprecise estimates.<br />

• A value <strong>of</strong> u too small implies that the generalized Pareto character does not hold for<br />

the moderate observations and it yields biased estimates. This bias can be important as<br />

moderate observations usually constitute the largest proportion <strong>of</strong> the sample.<br />

Thus, our aim is to determine the minimum value <strong>of</strong> the threshold beyond which the<br />

Generalized Pareto distribution becomes a reasonable approximation to the tail <strong>of</strong> the<br />

distribution.<br />

To identify the optimal threshold value, we apply here two methods:<br />

Generalized Pareto Index Plot<br />

In virtue <strong>of</strong> the stability property <strong>of</strong> the Generalized Pareto distribution, if X is Generalized<br />

Pareto distributed with distribution function G , the variable X − uX>uis Generalized<br />

Pareto distributed with distribution function G +u , i.e. with the same index parameter ,<br />

for any u>0. Consequently, in the plot <strong>of</strong> the index maximum likelihood estimators ˆ<br />

resulting from using increasing thresholds, we will observe that estimation stabilizes when<br />

the smallest threshold for which the Generalized Pareto behaviour holds is reached.<br />

Gertensgarbe Plot<br />

This procedure proposed by Gertensgarbe & Werner (1989) is very powerful and provides<br />

an estimation <strong>of</strong> the optimal threshold. Briefly, the Gertensgarbe plot aims to select a proper<br />

threshold, based on the determination <strong>of</strong> the starting point <strong>of</strong> the extreme value region.<br />

More precisely, given the series <strong>of</strong> differences i = x i − x i−1 , i = 2 3n<strong>of</strong> a sorted<br />

sample, x 1 ≤ x 2 ≤···≤x n , the starting point <strong>of</strong> the extreme region will be detected as a<br />

change point <strong>of</strong> the series i i= 2 3n. The key idea is that it may be reasonably


226 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

expected that the behaviour <strong>of</strong> the differences corresponding to the extreme observations<br />

will be different from the one corresponding to the non-extreme observations. This change<br />

<strong>of</strong> behaviour will appear as a change point <strong>of</strong> the series <strong>of</strong> differences.<br />

To identify the change point in a series, a sequential version <strong>of</strong> the Mann-Kendall test is<br />

applied. In this test, the normalized series U i is defined as<br />

where U ∗<br />

i<br />

U i = U ∗<br />

√<br />

i<br />

− ii−1<br />

4<br />

ii−12i+5<br />

72<br />

= ∑ i<br />

k=2 n k, and n k is the number <strong>of</strong> values in 2 k lesser than k . Another<br />

series, denoted by U p , is calculated applying the same procedure to the series <strong>of</strong> the<br />

differences from the end to the start, n 2 , instead <strong>of</strong> from the start to the end. The<br />

intersection point between these two series, U i and U p , determines a probable change point<br />

that will be significant if it exceeds a high Normal percentile.<br />

Since, usually, these techniques can only provide approximative information about the<br />

threshold, simultaneous application <strong>of</strong> them is highly recommended in order to get more<br />

reliable results.<br />

Application to <strong>Claim</strong> Costs Recorded in Portfolio C<br />

The Generalized Pareto index plot is shown in Figure 5.1. We see that the estimates <strong>of</strong><br />

tail parameter roughly stabilize after E85000. The Gertensgarbe plot gives a threshold <strong>of</strong><br />

E104 397 (which corresponds to the 17th largest loss). The p-value <strong>of</strong> the Mann-Kendall<br />

<br />

0.4<br />

0.6<br />

Estimation <strong>of</strong> ξ<br />

0.8<br />

1.0<br />

50000 100000 150000<br />

Threshold<br />

Figure 5.1 Generalized Pareto index plot for the claim costs in Portfolio C.


Efficiency and Bonus Hunger 227<br />

test is equal to 261 %, so that the result is significant at the 5 % level. Considering the two<br />

analyses, the threshold for being qualified as a large claim is set to E100 000.<br />

5.2.3 Generalized Pareto Fit to the Costs <strong>of</strong> Large <strong>Claim</strong>s<br />

Maximum Likelihood<br />

Now that the threshold defining the large losses has been determined as E100 000, we model<br />

the excesses over E100 000 with the help <strong>of</strong> a Generalized Pareto distribution. Descriptive<br />

statistics for the cost <strong>of</strong> large claims <strong>of</strong> Portfolio C are displayed in Table 5.2. The mean is<br />

equal to E364 6062. The limited number <strong>of</strong> large losses (17 for Portfolio C) does not allow<br />

for incorporating exogeneous information in these amounts. Therefore, the same parameters<br />

and are used for all the large losses. These parameters are estimated by maximum<br />

likelihood.<br />

Let us now fit the Generalized Pareto model to the excesses over E100 000. To this end,<br />

we use maximum likelihood theory, and we maximize the likelihood function given by<br />

=<br />

∏<br />

ix i >100 000<br />

( (<br />

1<br />

1 + −<br />

1<br />

x −1<br />

i − 100 000) ) <br />

The log-likelihood to be maximized is<br />

(<br />

L =−ln #x i x i > 100 000 − 1 + 1 ) ( ∑<br />

ln 1 + )<br />

<br />

x i − 100 000<br />

ix i >100 000<br />

where #x i x i > 100 000 = 17 in Portfolio C. This optimization problem requires numerical<br />

algorithms. There are different approaches to getting starting values for the parameters <br />

and . A natural approach consists <strong>of</strong> using moment conditions (that is, we equate sample<br />

mean and sample variance to their theoretical expressions involving and ). The values<br />

Table 5.2 Descriptive statistics <strong>of</strong> the cost <strong>of</strong> large claims<br />

(Portfolio C).<br />

Statistic<br />

Value<br />

Length 17<br />

Minimum<br />

104 3867<br />

Maximum<br />

1 989 5679<br />

Mean<br />

364 6062<br />

Standard deviation<br />

439 8827<br />

25th percentile<br />

140 0324<br />

Median<br />

252 2317<br />

75th percentile<br />

407 4776<br />

90th percentile<br />

499 7274<br />

95th percentile<br />

797 8475<br />

99th percentile<br />

1 751 2238<br />

Skewness<br />

29


228 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

<strong>of</strong> and coming from the linear fit to the empirical mean excess function could also be<br />

used, as shown below.<br />

Moment Method<br />

The mean and the variance <strong>of</strong> the Generalized Pareto distribution are respectively given<br />

by /1 − provided u<br />

− u = ∑ n<br />

i=1 x i − uIx i >u<br />

#x i x i >u<br />

where IA = 1 if the event A did occur and 0 otherwise. This means that eu is estimated by<br />

the sum <strong>of</strong> exceedances over the threshold u divided by the number <strong>of</strong> data points exceeding<br />

the threshold u.<br />

Usually, the mean excess function is evaluated on the observations <strong>of</strong> the sample. Denoting<br />

the sample observations arranged in ascending order as x 1 ≤ x 2 ≤···≤x n , we have in<br />

this case<br />

ê n x k = 1 n−k<br />

∑<br />

x<br />

n − k k+j − x k <br />

It is easily checked that when X has a Generalized Pareto distribution function G , the<br />

mean excess function is a linear function in u<br />

eu =<br />

j=1<br />

<br />

1 − + <br />

1 − u<br />

provided +u > 0. Hence, the idea is to determine, on the basis <strong>of</strong> the graph <strong>of</strong> the empirical<br />

estimator <strong>of</strong> the excess function ê n , a region u + where ê n t becomes approximately<br />

linear for t ≥ u. The intercept and slope <strong>of</strong> a straight line fit to ê n determine the estimations<br />

<strong>of</strong> and .


Efficiency and Bonus Hunger 229<br />

Application to the <strong>Claim</strong> Costs Recorded in Portfolio C<br />

The initial values obtained with the the moment method applied to large claims <strong>of</strong> Portfolio<br />

C are ̂ 0 = 0319 and ̂ 0 = 180 1767. The maximum likelihood estimates <strong>of</strong> the Generalized<br />

Pareto parameters are ̂ = 04152 and ̂ = 156 7374. Different starting values have been<br />

used and the convergence always occurred.<br />

Note that the limited sample size for the large losses does not allow us to draw reliable<br />

conclusions about major claims (in particular, large sample properties <strong>of</strong> the maximum<br />

likelihood estimators cannot be invoked with a sample size as small as 17). In practice, the<br />

insurance company must gather the large losses together in a data base to perform detailed<br />

analysis. The amounts <strong>of</strong> these large losses have to be corrected for different sources <strong>of</strong><br />

inflation. The assistance <strong>of</strong> a reinsurance company is useful in this respect, especially for<br />

insurers with small to moderate portfolios.<br />

5.2.4 <strong>Modelling</strong> the Number <strong>of</strong> Large <strong>Claim</strong>s<br />

The number <strong>of</strong> large claims N large<br />

i for policyholder i is modelled using the oi large<br />

i <br />

distribution. This is in line with the fact that large claims occur purely at random. Poisson<br />

regression is then used to incorporate the information available about policyholder i,<br />

summarized in a vector xi<br />

T = x i1 x ip consisting <strong>of</strong> explanatory variables (assumed<br />

here to be categorical and coded by means <strong>of</strong> binary variables), in the expected frequency<br />

<strong>of</strong> large claims through a linear predictor by means <strong>of</strong> an exponential link function.<br />

The regression coefficients are estimated with Poisson regression. All the explanatory<br />

variables introduced in Section 5.1.6 have been excluded from the model. Only the intercept<br />

remained in the Poisson regression model. This is not surprising with such a small number <strong>of</strong><br />

policies producing a large claim. We obtained ̂ 0 =−90555, with a standard deviation equal<br />

to 0.2425. The resulting frequency <strong>of</strong> large claims is<br />

̂<br />

large<br />

i<br />

= 00117 % for all policyholders.<br />

Remark 5.1 (Logistic regression) In most cases, policyholders report 0 or just 1 large<br />

claim. Therefore, the number <strong>of</strong> large claims could also be modelled with a binary variable<br />

instead <strong>of</strong> a Poisson count. The probability that policyholder i reports (at least) a large claim<br />

can also be modelled with the help <strong>of</strong> logistic regression. Specifically, let us define the<br />

binary random variable J i<br />

{ 0 if policyholder i does not report any large claim during the observation period<br />

J i =<br />

1 if policyholder i reports at least one large claim during the observation period<br />

The aim is to model the probability PrJ i = 0 = q i x i that policyholder i does not report<br />

any large claim during the coverage period.<br />

Since q i x i ∈ 0 1, we resort to some distribution function F to link q i x i to the linear<br />

predictor 0 + ∑ p<br />

j=1 jx ij , that is<br />

(<br />

)<br />

p∑<br />

p∑<br />

q i x i = F 0 + j x ij ⇔ 0 + j x ij = F −1 q i x i <br />

j=1<br />

j=1


230 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Theoretically, any distribution function F can be used; in practice, we <strong>of</strong>ten take for F<br />

the Normal or Logistic distribution function. For instance, the logistic regression model is<br />

specified as<br />

q<br />

ln i x i <br />

p∑<br />

1 − q i x i = 0 + j x ij ⇔ q i =<br />

exp ( 0 + ∑ p<br />

j=1 )<br />

jx ij<br />

1 + exp ( 0 + ∑ p<br />

j=1 )<br />

jx ij<br />

j=1<br />

The SAS R /STAT procedure GENMOD can be used to perform this analysis. As in the<br />

Poisson regression case, all the explanatory variables described in Section 5.1.6 are excluded<br />

from the model. The estimated probability that policyholder i does not report any large<br />

claim is 99.99013 %. Hence, the corresponding large claim frequency is − ln 09999013 =<br />

000987 %, which is not too far from the estimated large<br />

i obtained using Poisson regression.<br />

5.2.5 <strong>Modelling</strong> the Costs <strong>of</strong> Moderate <strong>Claim</strong>s<br />

Different models can be used to describe the behaviour <strong>of</strong> the moderate claims (i.e. claims<br />

with an incurred cost less than E100 000) as a function <strong>of</strong> the observable characteristics<br />

<strong>of</strong> the policyholder; including Gamma, Inverse Gaussian and LogNormal distributions. We<br />

briefly review these three regression models next.<br />

Gamma Distribution<br />

Here we use a new parameterization <strong>of</strong> the Gamma probability density function (1.34).<br />

Specifically, we use the mean as parameter, together with a parameter related to the variation<br />

coefficient. The probability density function with the new parameters = / and = is<br />

then given by<br />

fy = 1<br />

<br />

( ) y (<br />

exp − y ) 1<br />

y (5.4)<br />

If Y has probability density function (5.4), then the first moments are given by<br />

EY = and VY = 2<br />

<br />

so that the variance is proportional to the square <strong>of</strong> the mean. Gamma regression assumes<br />

a coefficient <strong>of</strong> variation constantly equal to −1/2 . Thus it allows for heteroscedasticity<br />

(since the variance is proportional to the square <strong>of</strong> the mean, and is no more constant as in<br />

Gaussian regression models). Ideally, the Gamma regression model is best used with positive<br />

observations having a constant coefficient <strong>of</strong> variation. However, the model is robust to wide<br />

deviations from the latter assumption.<br />

The parameter controls the shape <strong>of</strong> the probability density function. Specifically, (i) if<br />

0


Efficiency and Bonus Hunger 231<br />

Let C ik be the cost <strong>of</strong> the kth claim reported by policyholder i; we assume that the<br />

individual claim costs C i1 C i2 are independent and identically distributed. Each C ik<br />

conforms to the Gamma law with mean<br />

(<br />

)<br />

p∑<br />

i = EC ik x i = exp 0 + j x ij (5.5)<br />

and variance VC ik x i = 2 i<br />

/. Note that here we use an exponential link between the linear<br />

predictor 0 + ∑ p<br />

j=1 jx ij and the expected value i . For theoretical reasons, a reciprocal<br />

link function is sometimes preferable (but destroys the nice multiplicative structure <strong>of</strong> the<br />

resulting price list).<br />

Let n i be the number <strong>of</strong> claims reported by policyholder i, and let c i1 c i2 c ini be the<br />

corresponding claim costs. The likelihood associated with the observations is<br />

= ∏<br />

n<br />

∏ i<br />

in i >0 k=1<br />

The corresponding log-likelihood is given by<br />

in i >0<br />

j=1<br />

( ( ) 1<br />

(<br />

cik<br />

exp − c ) )<br />

ik 1<br />

<br />

i i c ik<br />

L= ln <br />

= ∑<br />

(<br />

) )<br />

n p∑ ∑ i<br />

(−n i ln + n i ln − 0 − j x ij + ln c ik − n<br />

∑ i<br />

c<br />

ik<br />

i<br />

+ constant<br />

The likelihood equations are given by<br />

<br />

L= 0 ⇔ ∑<br />

j<br />

in i >0<br />

j=1<br />

k=1<br />

(<br />

x ij n i − c )<br />

i•<br />

= 0<br />

i<br />

for j = 1p, where c i• = ∑ n i<br />

k=1 c ik is the total cost <strong>of</strong> the standard claims reported by<br />

policyholder i. The maximum likelihood estimators are obtained with the help <strong>of</strong> Newton-<br />

Raphson techniques. The estimation <strong>of</strong> can be performed by maximum likelihood as in<br />

Chapter 2, or it can be obtained from the Pearson- or deviance-based dispersion statistic.<br />

Remark 5.2 Often, only the total claim amount C i• is available, and not the individual C ik s.<br />

In such a case, it is convenient to work with the mean claim amount C i = C i• /n i , where<br />

n i is the number <strong>of</strong> claims reported by policyholder i. Considering the Gamma likelihood<br />

equations, this is not restrictive since only the total claim amount is needed. Specifically,<br />

Formula (1.36) shows that the Gamma distributions are closed under convolution in some<br />

particular cases. In the new parameterization <strong>of</strong> the Gamma family used in the present<br />

chapter, the moment generating function <strong>of</strong> C ik is<br />

(<br />

Mt = 1 − t ) −<br />

i<br />

<br />

k=1


232 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

so that the moment generating function <strong>of</strong> C i is<br />

n∏<br />

j=1<br />

( ) ( t<br />

M = 1 − t ) − ′<br />

i<br />

i<br />

n i i<br />

′<br />

where ′ i = n i. Hence, the arithmetic average <strong>of</strong> the C ik s conforms to the am i n i i <br />

distribution. This situation is accounted for in GENMOD by specifying an appropriate<br />

weight n i .<br />

Example 5.1 (Gamma Regression for the Moderate <strong>Claim</strong> Costs in Portfolio C) The<br />

Gamma regression performed on the claim costs recorded in Portfolio C leads to the results<br />

in Table 5.3 where the following variables have been eliminated: Fuel (p-value <strong>of</strong> 9476 %),<br />

Gender (p-value <strong>of</strong> 8889 %), Use (p-value <strong>of</strong> 2748 %) and Power (p-value <strong>of</strong> 928 %).<br />

Moreover, for Agev, levels 6–10 and > 10 have been grouped together in a class > 5. For<br />

the variable Ageph, levels 31–60 and > 60 have been grouped in a class > 30. The resulting<br />

log-likelihood is equal to −147 62910. Type 3 analysis is presented in the following table:<br />

Source DF Chi-square Pr>Chi-sq<br />

Ageph 2 6592


Efficiency and Bonus Hunger 233<br />

Remark 5.3 (Tweedie Generalized Linear Models) The Tweedie distributions are a threeparameter<br />

family. They allow for any power variance function and any power link. The<br />

Tweedie family includes the Gaussian, Poisson, Gamma and Inverse Gaussian families as<br />

special cases. Specifically, let i = EY i be the expectation <strong>of</strong> the ith response Y i .We<br />

assume that<br />

p∑<br />

q 1<br />

i = 0 + j x ij and VY i = q 2<br />

i<br />

j=1<br />

where x i is a vector <strong>of</strong> covariates and is a vector <strong>of</strong> regression c<strong>of</strong>ficients, for some , q 1<br />

and q 2 . A value <strong>of</strong> zero for q 1 is interpreted as ln i = 0 + ∑ p<br />

j=1 jx ij . The variance power<br />

q 2 characterizes the distribution <strong>of</strong> the responses Y . The parameter q 2 is called the index<br />

parameter and determines the shape <strong>of</strong> the Tweedie distribution. For various values <strong>of</strong> q 2 ,<br />

we find the following particular cases: q 2 = 0 corresponds to the Normal distribution, q 2 = 1<br />

corresponds to the Poisson distribution, 1 2 corresponds to stable<br />

distributions for positive continuous data.<br />

As stated above, for 1 0 k=1 22 cik<br />

3 2 i 2 c ik<br />

)


234 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The log-likelihood is<br />

L 2 = ln 2 <br />

= ∑<br />

(<br />

− n i<br />

2 ln22 −<br />

in i >0<br />

so that the likelihood equations are given by<br />

<br />

j<br />

L 2 = 0 ⇔<br />

<br />

j<br />

⇔ ∑<br />

in i >0<br />

∑<br />

n<br />

∑ i<br />

k=1<br />

n<br />

∑ i<br />

in i >0 k=1<br />

)<br />

c ik − i 2<br />

+ constant<br />

2 i 2 c ik<br />

(<br />

1 − c ik<br />

i<br />

) 2<br />

1<br />

c ik<br />

= 0<br />

(<br />

x ij<br />

n<br />

i − c )<br />

i•<br />

= 0<br />

i i<br />

The estimation <strong>of</strong> 2 can be performed by maximum likelihood as in Chapter 2, or it can<br />

be obtained from the Pearson- or deviance-based dispersion statistic.<br />

Remark 5.4 As pointed out for the Gamma distribution in Remark 5.2, the actuary <strong>of</strong>ten<br />

only has at his disposal the total claim amount C i• , and not the individual C ik s. This is<br />

not really a problem since the likelihood equations only involve C i• . Considering (1.40),<br />

we see that the moment generating function <strong>of</strong> the mean claim amount C i in the new<br />

parameterization is given by<br />

√<br />

1 − 2 2 2 i<br />

(<br />

exp − n (<br />

i<br />

1 −<br />

2 i<br />

))<br />

t<br />

n i<br />

which corresponds to the Inverse Gaussian distribution with parameters i and 2 /n i .Asin<br />

the Gamma case, working with the average claim amounts is not restrictive for maximum<br />

likelihood estimation <strong>of</strong> the regression parameters, and this situation is accounted for in<br />

GENMOD by specifying an appropriate weight n i .<br />

Example 5.2 (Inverse Gaussian Regression for the Moderate <strong>Claim</strong> Costs in Portfolio C)<br />

The results <strong>of</strong> the Inverse Gaussian regression are given in Table 5.4 where the following<br />

variables have been eliminated: Fuel (p-value <strong>of</strong> 96.29 %), Gender (p-value <strong>of</strong> 84.58 %),<br />

Power (p-value <strong>of</strong> 78.32 %), Use (p-value <strong>of</strong> 56.40 %), Premium split (p-value <strong>of</strong> 23.04 %),<br />

City (p-value <strong>of</strong> 975 %) and Coverage (p-value <strong>of</strong> 5.37 %). Moreover, for Agev, levels 3–5,<br />

6–10 and > 10 have been grouped together in a class > 2. For the variable Ageph, levels<br />

31–60 and > 60 have been grouped in a class > 30. The resulting log-likelihood is equal to<br />

−150 21460. Type 3 analysis is presented in the following table:<br />

Source DF Chi-square Pr>Chi-sq<br />

Ageph 2 1056 00012<br />

Agev 1 4187


Efficiency and Bonus Hunger 235<br />

Table 5.4 Results <strong>of</strong> the Inverse Gaussian regression on the claim costs recorded in Portfolio C.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept 71169 00282 70616 71722 636343 < .0001<br />

Ageph 18–24 02293 01104 00129 04457 431 0.0378<br />

Ageph 25–30 01699 00697 00334 03065 595 0.0147<br />

Ageph > 30 0 0 0 0 . .<br />

Agev 0–2 01609 00756 00127 03091 453 0.0334<br />

Agev > 2 0 0 0 0 . .<br />

LogNormal Distribution<br />

Before the generalized linear models gained popularity in the actuarial pr<strong>of</strong>ession, claim sizes<br />

were <strong>of</strong>ten analysed using a Normal linear regression model after having been transformed<br />

to the log-scale. Although the results are usually quite similar for this method and Gamma<br />

regression, the latter approach is easier to interpret since it does not require any logarithmic<br />

transformation <strong>of</strong> the claim costs.<br />

Assume that the moderate claim sizes for policyholder i are independent and LogNormally<br />

distributed, with parameters 0 + ∑ p<br />

j=1 jx ij and 2 . Specifically, the C ik s are independent,<br />

and identically distributed for fixed i, with C ik ∼ Nor 0 + ∑ p<br />

j=1 jx ij 2 . Let n i be the<br />

number <strong>of</strong> claims reported by policyholder i, and let c i1 c i2 c ini be the corresponding<br />

claim costs. The likelihood associated with the observations is<br />

= ∏<br />

n<br />

∏ i<br />

in i >0 k=1<br />

⎛ (<br />

1<br />

√<br />

2cik exp ⎝− 1<br />

2 2<br />

ln c ik − 0 −<br />

p∑<br />

j=1<br />

j x ij<br />

) 2<br />

⎞<br />

⎠ <br />

The maximum likelihood estimators are obtained with the help <strong>of</strong> Newton-Raphson<br />

techniques. The average cost <strong>of</strong> a standard claim for policyholder i is then obtained from<br />

the formula<br />

(<br />

)<br />

p∑<br />

EC ik x i = exp 0 + j x ij + 2<br />

<br />

2<br />

Note that, in contrast to the Gamma and Inverse Gaussian cases, we cannot easily deal with<br />

the situation where only the total amount <strong>of</strong> moderate claims is available. This is due to the<br />

fact that the LogNormal family <strong>of</strong> distributions is not closed under convolution. Therefore,<br />

we fit the model to the observations made on policyholders having filed a single standard<br />

claim.<br />

Example 5.3 (LogNormal Regression for the Moderate <strong>Claim</strong> Costs in Portfolio C) The<br />

LogNormal regression cannot be performed with the help <strong>of</strong> SAS R /STAT procedure<br />

GENMOD (which does not support the LogNormal distribution). Often in practice, the<br />

data are first transformed on the log-scale, and a standard linear model is then fitted to<br />

the logarithms <strong>of</strong> the claim amounts. This ad-hoc procedure will be avoided here, and a<br />

maximum likelihood estimation procedure is performed on the original claim costs.<br />

j=1


236 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 5.5 Results <strong>of</strong> the LogNormal regression analysis for the claim costs recorded in Portfolio C.<br />

Variable Level Coeff Std error Wald 95 % conf limit t-value Pr >t<br />

Intercept 61223 00268 60702 61744 23038 60 01781 00312 01169 02392 571 Chi-sq<br />

Ageph 2 4058 < .0001<br />

City 1 770 0.0055<br />

Agev 3 3820 < .0001<br />

All the three remaining variables are statistically significant, and must be kept in the model.<br />

5.2.6 Resulting Price List for Portfolio C<br />

Formula for the Pure Premium<br />

Let freq be the vector <strong>of</strong> the regression coefficient for the claim frequencies, that is,<br />

the expected annual number <strong>of</strong> standard claims is exp freq<br />

0 + ∑ p<br />

j=1 freq j x ij . Similarly,<br />

let cost be the regression coefficient for the moderate claim sizes. We retain here the<br />

LogNormal modelling for the moderate claim sizes, so that the expected moderate claim<br />

amount is exp cost<br />

0<br />

+ ∑ p<br />

j=1 cost j<br />

x ij + 2 /2. Neglecting the large claims, the pure premium<br />

for policyholder i is given by<br />

⎡ ⎤<br />

Ni∑<br />

small<br />

(<br />

p∑ (<br />

E ⎣ C ik<br />

⎦ = EN small<br />

i<br />

EC i1 = exp freq<br />

0 + cost<br />

0<br />

+ <br />

freq<br />

) )<br />

j + cost<br />

j xij + 2<br />

<br />

2<br />

k=1<br />

If all the components <strong>of</strong> x i are binary, we then get a multiplicative price list. The total<br />

premium is then obtained by adding the expected cost <strong>of</strong> the large losses, i.e. EN large<br />

i EL i1 .<br />

j=1


Efficiency and Bonus Hunger 237<br />

We still need to estimate freq to be able to compute the pure premium for the different<br />

categories <strong>of</strong> policyholders in Portfolio C.<br />

<strong>Risk</strong> <strong>Classification</strong> for <strong>Claim</strong> Frequencies<br />

Here we follow the method described in Chapter 2 for analysing the observed claim numbers.<br />

A Poisson regression is first performed on the claim frequencies <strong>of</strong> Portfolio C. This leads<br />

to the results <strong>of</strong> Table 5.6. Only the Use <strong>of</strong> the vehicle has been removed from the model<br />

(p-value <strong>of</strong> 3448 %). The levels 0–2, 6–10 and > 10 <strong>of</strong> the variable Agev are grouped<br />

together. The log-likelihood <strong>of</strong> the final model is equal to −61 5639 and the Type 3 analysis<br />

is presented in the following table:<br />

Source DF Chi-square Pr>Chi-sq<br />

Ageph 3 89266


238 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

As explained in Chapter 2, overdispersion can be detected by using the statistics T 1 ,<br />

T 2 or T 3 presented in Section 2.4.6. The values obtained with Portfolio C are T 1 = 2004,<br />

T 2 = 1456 and T 3 = 1289. All the associated p-values are less than 10 −4 leading to the<br />

rejection <strong>of</strong> the null hypothesis in favour <strong>of</strong> a mixed Poisson model.<br />

In order to take the residual heterogeneity into account, we fitted a Negative Binomial<br />

regression model to the Portfolio C. The results are given in Table 5.7. All the variables and<br />

levels are still relevant. The log-likelihood is now equal to −61 393.3 and is better than with<br />

Poisson regression.<br />

Resulting Price List<br />

The resulting price list is obtained thanks to the Negative Binomial model for the frequency<br />

(presented in Table 5.7) and to the LogNormal model for the average cost <strong>of</strong> the standard<br />

claims (presented in Table 5.5).<br />

Neglecting the large claims, the pure premium <strong>of</strong> the reference class is obtained by<br />

(<br />

)<br />

exp freq<br />

0 + cost<br />

0<br />

+ 2<br />

= E31676 (5.6)<br />

2<br />

This amount corresponds to the pure premium <strong>of</strong> a male policyholder living in an urban<br />

area, aged between 31 and 60, driving a car older than 10 years, using gasoil and with a<br />

power greater than 110 kW, paying his premium in several installments and having opted for<br />

a more extensive coverage than only the compulsory motor third party liability insurance.<br />

To obtain the pure premium (neglecting the large claims) for a policyholder belonging to<br />

another risk class, the percentages <strong>of</strong> Table 5.8 must be applied to the base pure premium<br />

Table 5.7 Results <strong>of</strong> the Negative Binomial regression <strong>of</strong> the claim counts recorded in Portfolio C.<br />

Variable Level Coeff Std error Wald 95 % conf limit Chi-sq Pr>Chi-sq<br />

Intercept −14626 00720 −16037 −13216 41306


Efficiency and Bonus Hunger 239<br />

Table 5.8 Price list for the standard claims in Portfolio C.<br />

Variable Level Influence <strong>of</strong><br />

frequency<br />

Influence <strong>of</strong><br />

cost<br />

Total influence<br />

Ageph 18–24 19033 % 11908 % 22664 %<br />

Ageph 25–30 14339 % 10000 % 14339 %<br />

Ageph > 60 7997 % 11949 % 9556 %<br />

Ageph 31–60 10000 % 10000 % 10000 %<br />

Gender Female 10622 % 10000 % 10622 %<br />

Gender Male 10000 % 10000 % 10000 %<br />

Agev 0–2 10000 % 8844 % 8844 %<br />

Agev 3–5 9397 % 8093 % 7605 %<br />

Agev 6–10 10000 % 8907 % 8907 %<br />

Agev > 10 10000 % 10000 % 10000 %<br />

Fuel Petrol 8256 % 10000 % 8256 %<br />

Fuel Gasoil 10000 % 10000 % 10000 %<br />

Premium split No 7594 % 10000 % 7594 %<br />

Premium split Yes 10000 % 10000 % 10000 %<br />

Coverage MTPL only 11266 % 10000 % 11266 %<br />

Coverage More 10000 % 10000 % 10000 %<br />

City Rural 7969 % 10683 % 8514 %<br />

City Urban 10000 % 10000 % 10000 %<br />

Power < 66 kW 7445 % 10000 % 7445 %<br />

Power 66–110 kW 8260 % 10000 % 8260 %<br />

Power > 110 kW 10000 % 10000 % 10000 %<br />

(5.6) as a function <strong>of</strong> the characteristics <strong>of</strong> the policyholder. For example, the pure premium<br />

<strong>of</strong> a woman aged between 25 and 30, living in a rural region, driving a car older than 10<br />

years, using petrol and with a power less than 66 kW, paying her premium once a year and<br />

being covered only for the MTPL will be equal to<br />

E31676 × 10622 % (correction for being a female policyholder)<br />

× 14339 % (correction for being aged between 25 and 30)<br />

× 8514 % (correction for living in a rural area)<br />

× 8256 % (correction for using petrol)<br />

× 7445 % (correction for driving a low-power car)<br />

× 7594 % (correction for paying the premium once a year)<br />

× 11266 % (correction for buying MTPL coverage only)<br />

= E21601<br />

Finally, to obtain the total pure premium <strong>of</strong> a policyholder, the expected cost <strong>of</strong> large<br />

claims (which is independent <strong>of</strong> the explanatory variables) must be added. The value is equal<br />

to<br />

00117 % × E368 01880 = E4306


240 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The total pure premium <strong>of</strong> a policyholder belonging to the reference class is then<br />

E31676 + E4306 = E35982<br />

Note that the fitted mean equal to E 368 018.80 is close to the observed mean <strong>of</strong> the 17 large<br />

losses recorded in Table 5.2.<br />

Recall that the analysis <strong>of</strong> large claims performed in this chapter is based on an estimation<br />

<strong>of</strong> their final cost six months after the end <strong>of</strong> the observation year 1997. We thus work with<br />

incurred losses (payments plus reserves). In practice, the company should maintain a data<br />

base recording the costs <strong>of</strong> large claims that have occurred in the past, corrected for the<br />

different sources <strong>of</strong> inflation (reinsurance companies can <strong>of</strong>ten provide valuable assistance<br />

to the ceding companies in this respect). The typical price for a reinsurance treaty covering<br />

motor third party liability insurance losses in excess <strong>of</strong> E 350 000 represents about 5 % <strong>of</strong><br />

the total motor premium income <strong>of</strong> a Belgian insurance company, so that the expected cost<br />

<strong>of</strong> large claims computed in Portfolio C seems to be <strong>of</strong> the right order <strong>of</strong> magnitude.<br />

Remark 5.5 If the sum <strong>of</strong> the individual pure premiums obtained above exceeds the<br />

observed total loss for the insurance portfolio during the reference period, or if we expect<br />

larger losses in the future (because, e.g., <strong>of</strong> different sources <strong>of</strong> inflation), we can then<br />

keep the same relative premium amounts applied to the anticipated future total claim cost.<br />

This allows us to incorporate in the individual premiums observed trends in the total claim<br />

amount.<br />

5.3 Measures <strong>of</strong> Efficiency for Bonus-Malus Scales<br />

The elasticity <strong>of</strong> a bonus-malus system measures its response to a change in the expected<br />

claim frequency or expected aggregate claim amount. We expect that the premium paid by<br />

the policyholders subject to bonus-malus scales is increasing in the expected claim frequency<br />

or total claim amount. The rate <strong>of</strong> increase is related to the concept <strong>of</strong> efficiency.<br />

5.3.1 Loimaranta Efficiency<br />

Definition<br />

Let us denote as r the average relativity once stationarity has been reached, for a<br />

policyholder with annual expected claim frequency , i.e.<br />

r =<br />

s∑<br />

l r l <br />

l=0<br />

The Loimaranta efficiency Eff Loi is then defined as the elasticity <strong>of</strong> the relative premium<br />

induced by the bonus-malus system, that is,<br />

Eff Loi =<br />

dr<br />

r<br />

d<br />

<br />

=<br />

d ln r<br />

d ln


Efficiency and Bonus Hunger 241<br />

Computation<br />

The computation <strong>of</strong> Eff Loi requires the determination <strong>of</strong> the derivative <strong>of</strong> r with<br />

respect to the annual expected claim frequency . This derivative is given by<br />

dr<br />

d<br />

= s∑<br />

l=0<br />

d l <br />

d<br />

r l<br />

so that its computation requires the derivative <strong>of</strong> the stationary probabilities l with<br />

respect to the annual expected claim frequency . To get d l /d, it suffices to<br />

differentiate (4.8): we thus have to solve the linear system<br />

⎧<br />

⎪⎨<br />

d T <br />

= dT <br />

d d<br />

P + T dP<br />

d<br />

⎪⎩<br />

∑ s d l <br />

l=0<br />

d = 0<br />

with respect to the d l /ds.<br />

Global Efficiency<br />

So far, we have defined the Loimaranta efficiency for a given value <strong>of</strong> the expected annual<br />

claim frequency. To get a value for the portfolio, we have to account for its composition<br />

with respect to rating factors as well as its residual heterogeneity. Hence, the Loimaranta<br />

efficiency for the portfolio is obtained by averaging over all the possible values for as<br />

[ ( ) ]<br />

Eff Loi = E Eff Loi <br />

Loimaranta Efficiency in Portfolio A<br />

Table 5.9 displays the Loimaranta efficiencies for a good driver, with annual expected claim<br />

frequency 9.28 %, for an average driver with annual expected claim frequency 14.09 %, and<br />

for a bad driver with annual expected claim frequency 28.40 %. For the −1/top bonus-malus<br />

scale, the efficiency is larger for the average driver than for the good and bad ones. On<br />

the contrary, for the −1/+ 2 and −1/+ 3 bonus-malus scales, the efficiencies appear to<br />

increase from the good to the average driver, and from the average driver to the bad one.<br />

The global efficiencies listed in Table 5.9 are rather poor, ranging from 23.23 % for the<br />

−1/top bonus-malus scale to 28.39 % in the −1/+ 2 bonus-malus scale. This means that<br />

these bonus-malus systems weakly respond to a change in the underlying claim frequency.<br />

Table 5.9 Loimaranta efficiency for three types <strong>of</strong> insured drivers (a good driver, with annual<br />

expected claim frequency 9.28 %, an average driver with annual expected claim frequency 14.09 %,<br />

and a bad driver with annual expected claim frequency 28.40 %) and global efficiency for Portfolio<br />

A, for the −1/top, −1/+ 2 and −1/+ 3 bonus-malus scales.<br />

Frequency Scale −1/top Scale −1/ + 2 Scale −1/ + 3<br />

0.0928 02865 02380 02987<br />

0.1409 03144 03793 04008<br />

0.2840 02901 06204 04733<br />

Portfolio A 02323 02839 02775


242 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Figure 5.2 displays the Loimaranta efficiency as a function <strong>of</strong> the annual expected claim<br />

frequencies for the three bonus-malus systems. We see that the efficiency first increases<br />

and then decreases, reaching its maximum value at about 15 % for the −1/top bonus-malus<br />

system, at about 30 % for the −1/+2 bonus-malus system, and at about 25 % for the −1/+3<br />

bonus-malus system.<br />

5.3.2 De Pril Efficiency<br />

Definition<br />

Loimaranta efficiency is an asymptotic concept that does not depend on the level presently<br />

occupied in the scale. De Pril efficiency is a transient concept that explicitly considers time<br />

value <strong>of</strong> money.<br />

Let v


Efficiency and Bonus Hunger 243<br />

Loimaranta efficiency<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

0.7<br />

0.6<br />

Scale –1/top<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

Scale –1/+2<br />

Loimaranta efficiency<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0.0<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

Loimaranta efficiency<br />

0.48<br />

0.46<br />

0.44<br />

0.42<br />

0.40<br />

0.38<br />

0.36<br />

0.34<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

Scale –1/+3<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

Figure 5.2 Loimaranta efficiency as a function <strong>of</strong> the annual expected claim frequencies for the three<br />

bonus-malus systems and Portfolio A.


244 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

De Pril efficiency Eff DeP is thus defined analogously to Eff Loi , substituting V l for r.<br />

Note that Eff DeP l now depends on the starting class l. The initial class can then be<br />

selected so as to maximize Eff DeP l .<br />

Computation<br />

To compute Eff DeP l , we need the derivatives dV l /d <strong>of</strong> the V l satisfying (5.7).<br />

These derivatives can be obtained by solving the system<br />

dV l <br />

d<br />

= v ∑ exp− k<br />

k!<br />

k=0<br />

This system admits a unique solution.<br />

(( ) k<br />

− 1 V Tk l + dV )<br />

T k l<br />

l= 0s<br />

d<br />

Global Efficiency<br />

At the portfolio level, the efficiency is then obtained by averaging over all the possible<br />

values for , that is,<br />

Eff DeP l = EEff DeP l <br />

De Pril Efficiency in Portfolio A<br />

Table 5.10 displays the De Pril efficiencies associated with the highest level 5 for a good<br />

driver, with annual expected claim frequency 9.28 %; for an average driver with annual<br />

expected claim frequency 14.09 %; and for a bad driver with annual expected claim frequency<br />

28.40 %. The discount factor is taken to be v = 1/104. De Pril efficiency behaves roughly<br />

as the Loimaranta efficiency displayed in Table 5.9.<br />

Figure 5.3 displays the De Pril efficiency as a function <strong>of</strong> the annual expected claim<br />

frequencies for the three bonus-malus systems. We see that the efficiency first increases and<br />

then decreases, reaching its maximum value at about the same frequencies as the Loimaranta<br />

efficiency. The shape <strong>of</strong> both efficiencies is pretty much the same.<br />

Let us now use the De Pril efficiency to select the optimal starting level. We have computed<br />

in Table 5.11 the values <strong>of</strong> Eff DeP l and Eff DeP l according to the initial level. We see<br />

that the optimal starting level is 0 for the three −1/top, −1/ + 2 and −1/ + 3 bonus-malus<br />

scales.<br />

Table 5.10 De Pril efficiency for three types <strong>of</strong> insured drivers (a good driver, with annual expected<br />

claim frequency 9.28 %, an average driver with annual expected claim frequency 14.09 %, and a bad<br />

driver with annual expected claim frequency 28.40 %) and global efficiency for Portfolio A, for the<br />

−1/top, −1/+ 2 and −1/+ 3 bonus-malus scales (starting level: level 5).<br />

Frequency Scale −1/top Scale −1/ + 2 Scale −1/ + 3<br />

0.0928 02192 01871 02252<br />

0.1409 02482 02880 03039<br />

0.2840 02414 04633 03741<br />

Portfolio A 01817 02186 02150


Efficiency and Bonus Hunger 245<br />

De Pril efficiency<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

Scale –1/top<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

De Pril efficiency<br />

De Pril efficiency<br />

0.48<br />

0.46<br />

0.44<br />

0.42<br />

0.40<br />

0.38<br />

0.36<br />

0.34<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

0.38<br />

0.36<br />

0.34<br />

0.32<br />

0.30<br />

0.28<br />

0.26<br />

0.24<br />

0.22<br />

0.20<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.10<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0.00<br />

Scale –1/+2<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

Scale –1/+3<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1<br />

<strong>Claim</strong>s frequency<br />

Figure 5.3 De Pril efficiency as a function <strong>of</strong> the annual expected claim frequencies for the three<br />

bonus-malus systems and Portfolio A.


246 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 5.11 De Pril efficiency according to the starting level, for the −1/top, −1/ + 2 and −1/ + 3<br />

bonus-malus scales.<br />

Initial level<br />

−1/top scale<br />

Eff DeP l 00928 Eff DeP l 01409 Eff DeP l 02804 Eff DeP l<br />

0 02711379 03018164 0287755 02230948<br />

1 02644811 02952273 02826536 02179881<br />

2 02558848 02863492 02746372 02109572<br />

3 02454854 02755109 02648401 02025042<br />

4 02332928 02628293 02537603 01927769<br />

5 02191762 0248209 02414355 01817001<br />

Initial level<br />

−1/ + 2 scale<br />

Eff DeP l 00928 Eff DeP l 01409 Eff DeP l 02804 Eff DeP l<br />

0 02168178 03422846 05689437 0262244<br />

1 02159419 03399076 05617233 02597218<br />

2 02156974 03369153 05488944 02558833<br />

3 02120306 03274338 05247412 02474823<br />

4 0202593 0311189 04963046 02351387<br />

5 01870807 02878845 04633192 02186229<br />

Initial level<br />

−1/ + 3 scale<br />

Eff DeP l 00928 Eff DeP l 01409 Eff DeP l 02804 Eff DeP l<br />

0 02733728 03689638 04526559 02605588<br />

1 02691461 0362999 04449703 02563432<br />

2 02648276 03556598 04328504 02508412<br />

3 02572639 03446637 0417896 02429854<br />

4 02429893 03261114 03971923 02302266<br />

5 02252077 03038548 03740766 02149911<br />

5.4 Bonus Hunger and Optimal Retention<br />

5.4.1 Correcting the Estimations for Censoring<br />

As explained in the introduction, the policyholders subject to a bonus-malus mechanism tend<br />

to self-defray minor accidents to avoid premium surcharges. This means that the number <strong>of</strong><br />

accidents is a censored variable: the insurer only knows the number <strong>of</strong> claims filed by the<br />

insured drivers, and not the number <strong>of</strong> accidents they caused. We develop here a simple<br />

statistical model allowing for censorship in the observed claim costs (and thus also in the<br />

observed numbers <strong>of</strong> claims reported to the insurer). The claiming threshold is considered<br />

here as a random variable, specific to each policyholder and with a distribution depending<br />

on the level occupied in the bonus-malus scale at the beginning <strong>of</strong> the observation period as<br />

well as on observable characteristics.<br />

Specifically, let us consider the LogNormal model for moderate claim sizes: the claim<br />

costs are then seen as independent and identically distributed realizations <strong>of</strong> LogNormal<br />

random variables in each risk class. Now, each policyholder in this class reports an accident


Efficiency and Bonus Hunger 247<br />

to the insurer if its cost exceeds a random threshold, assumed to be LogNormally distributed<br />

with parameters specific to the level occupied in the scale. Note that we deal here with<br />

moderate claim sizes only. The reason is that large claims are not subject to bonus hunger<br />

and are systematically reported to the insurer. The frequency <strong>of</strong> large claims will have to be<br />

added to the corrected frequency <strong>of</strong> moderate claims to get the actual number <strong>of</strong> accidents<br />

caused each year.<br />

Let l i be the level occupied by policyholder i in the bonus-malus scale at the beginning<br />

<strong>of</strong> the period, and let RL i l i ∼ N or i 2 be the random optimal retention, with a linear<br />

predictor <strong>of</strong> the form<br />

i = 0 +<br />

p∑<br />

j x ij + fl i <br />

j=1<br />

specific to policyholder i, where the function f· expresses the effect <strong>of</strong> occupying level l i<br />

in the bonus-malus scale. Considering the values obtained for the optimal retention in the<br />

literature (reaching a maximum somewhere in the middle <strong>of</strong> the scale, and decreasing when<br />

approaching uppermost and lowermost levels), we will use here a quadratic effect f <strong>of</strong> l i .<br />

Note that other approaches are possible (see the references in the closing section for more<br />

details).<br />

This means that policyholder i will report all the accidents with a cost larger than RL i l i ,<br />

and defray himself all those with a cost less than RL i l i . At the portfolio level, the<br />

RL i l i s are assumed to be independent. Now, let CA ik be the cost <strong>of</strong> the kth accident<br />

caused by policyholder i. We assume that for each i the random variables CA i1 CA i2 <br />

are<br />

∑<br />

independent and identically distributed, with CA ik ∼ N or i 2 , where i = 0 +<br />

p<br />

j=1 jx ij . Moreover, the CA ik s and RL i l i are mutually independent. We consider here<br />

the explanatory variables selected in the LogNormal analysis <strong>of</strong> the censored claim costs,<br />

presented in Table 5.5.<br />

Now, denoting as c i1 c ini the costs <strong>of</strong> the n i moderate claims filed by policyholder i,<br />

the likelihood is<br />

2 = ∏ n<br />

∏ i<br />

f i c ik <br />

f i c ik =<br />

in i >0 k=1<br />

where f i · denotes the probability density function <strong>of</strong> CA ik given CA ik >RL i l i . Each<br />

factor involved in the likelihood can be written is<br />

1<br />

√ exp<br />

(− ln c ( )<br />

ik − i 2 ln cik − i<br />

2cik<br />

2 2 )<br />

<br />

(<br />

1 − − )<br />

i − <br />

√ i<br />

2 + 2<br />

<br />

<br />

The estimators <strong>of</strong> the parameters , , 2 , and are determined by maximizing the likelihood<br />

2 .<br />

This basic model could be refined in different respects. Firstly, the retention limit could<br />

depend on the number <strong>of</strong> claims previously filed by the policyholder during the same<br />

year. The retention for the second claim depends on the level to which the policyholder is


248 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

transferred after the first claim has been reported. Here, we only use the observations related<br />

to the policyholders having filed a single standard claim during the observation period (so<br />

that we have the exact cost <strong>of</strong> this claim at our disposal). Also, the estimation <strong>of</strong> f could<br />

be performed in a nonparametric way, allowing for an effect fl i <strong>of</strong> being in level l i (and<br />

imposing some smoothness in the fls if needed) before selecting an appropriate parametric<br />

specification.<br />

Let us now apply this methodology to Portfolio C. The policyholders have been subject<br />

to the 23-level former compulsory Belgian bonus-malus scale. The estimation <strong>of</strong> the<br />

regression parameters in the model containing all the explanatory variables are displayed in<br />

Table 5.12. Here, we take fl = l − 13 2 . The log-likelihood is −130 1542565. We see<br />

from this table that several covariates are not significant. Therefore, we adopt a backward<br />

selection procedure, and exclude the irrelevant covariates. This yields the results displayed in<br />

Table 5.13. The log-likelihood is now −130 1554197. The parameters and are estimated<br />

at ̂ = 16821 and ̂ = 10286.<br />

Compared to the LogNormal fit to the claim costs displayed in Table 5.5, we see that<br />

the intercept is now smaller, as expected. The age classes have been modified, and the<br />

young drivers seem to cause more expensive accidents. The effect <strong>of</strong> the covariate City<br />

remains approximately the same. The categories for the age <strong>of</strong> the vehicle have also been<br />

modified.<br />

Table 5.12 Fit <strong>of</strong> the model for the accident costs subject to bonus hunger in Portfolio C, containing<br />

all the explanatory variables.<br />

Variable Level Coeff Std error Wald 95 % conf limits Chi-sq Pr>Chi-sq<br />

Intercept 58028 00453 57122 58934 1640911 < 00001<br />

Ageph 18−24 02624 00640 01344 03903 1682 < 00001<br />

Ageph > 60 00717 00500 −00282 01716 206 01512<br />

Ageph 25−60 0 0 0 0 . .<br />

City Rural 00459 00275 −00091 01009 279 00948<br />

City Urban 0 0 0 0 . .<br />

Agev 0−2 00327 00652 −00977 01631 025 06158<br />

Agev 3−5 −01721 00465 −02652 −00791 1369 00002<br />

Agev 6−10 −01445 00357 −02159 −00732 1640 00001<br />

Agev > 10 0 0 0 0 . .<br />

Variable Level Coeff Std error Wald 95 % conf limits Chi-sq Pr>Chi-sq<br />

Intercept 35045 00904 33238 36852 150445 < 00001<br />

Ageph 18−24 −03244 01637 −06519 00031 393 00476<br />

Ageph > 60 05419 01105 03209 07630 2405 < 00001<br />

Ageph 25−60 0 0 0 0 . .<br />

City Rural 00796 00416 −00036 01628 366 00558<br />

City Urban 0 0 0 0 . .<br />

Agev 0−2 −07825 02621 −13067 −02582 891 00028<br />

Agev 3−5 −03406 01107 −05620 −01193 947 00021<br />

Agev 6−10 00190 00449 −00708 01087 018 06725<br />

Agev > 10 0 0 0 0 . .<br />

BM level −00011 00006 −00022 00000 367 00555


Efficiency and Bonus Hunger 249<br />

Table 5.13 Fit <strong>of</strong> the final model for the accident costs subject to bonus hunger in Portfolio C.<br />

Variable Level Coeff Std error Wald 95 % conf limits Chi-sq Pr>Chi-sq<br />

Intercept 58084 00402 57279 58889 2084738 < 00001<br />

Ageph 18–24 02522 00661 01201 03843 1457 00001<br />

Ageph > 24 0 0 0 0 . .<br />

City Rural 00692 00274 00145 01240 639 00115<br />

City Urban 0 0 0 0 . .<br />

Agev 3–5 −01866 00394 −02653 −01078 2245 < 00001<br />

Agev 6–10 −01582 00349 −02281 −00883 2051 < 00001<br />

Agev 0–2 & > 10 0 0 0 0 . <br />

Variable Level Coeff Std error Wald 95 % conf limits Chi-sq Pr>Chi-sq<br />

Intercept 35269 00870 33529 37010 164272 < 00001<br />

Ageph 18–24 −03077 01430 −05937 −00217 463 00314<br />

Ageph > 60 06479 00852 04775 08183 5781 < 00001<br />

Ageph 25–60 0 0 0 0 . .<br />

Agev 0–2 −07126 01171 −09468 −04785 3704 < 00001<br />

Agev 3–5 −03170 00666 −04503 −01837 2263 < 00001<br />

Agev > 5 0 0 0 0 . .<br />

BM level −00011 00006 −00022 00000 370 00546<br />

Concerning the retention levels, we see that young drivers are more likely to defray only<br />

relatively cheap accidents, whereas older drivers are ready to self-defray more expensive<br />

accidents. The more recent the vehicle, the less accidents are self-defrayed. This may be due<br />

to the fact that comprehensive coverage is <strong>of</strong>ten bought for new vehicles, so that claims are<br />

filed to both third party liability and comprehensive. The effect <strong>of</strong> the level occupied in the<br />

scale is as follows: policyholders occupying the middle <strong>of</strong> the scale are ready to defray more<br />

expensive accidents than policyholders at the top or at the bottom <strong>of</strong> the scale.<br />

5.4.2 Number <strong>of</strong> <strong>Claim</strong>s and Number <strong>of</strong> Accidents<br />

Let Mi<br />

small be the number <strong>of</strong> small accidents caused by policyholder i. The number <strong>of</strong><br />

moderate claims filed by policyholder i is then given by<br />

By equating the expectations, we get<br />

EN small<br />

i<br />

= i =<br />

Mi∑<br />

small<br />

N small<br />

i<br />

= ICA ik >RL i l i <br />

=<br />

+∑<br />

k=0<br />

+∑<br />

k=0<br />

k=1<br />

PrM small<br />

i<br />

PrM small<br />

i<br />

= k<br />

k∑<br />

PrCA ij >RL i l i <br />

j=1<br />

= kk PrCA i1 >RL i l i <br />

= PrCA i1 >RL i l i EM small<br />

i


250 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Hence, the expected number <strong>of</strong> moderate accidents is given by<br />

EM small<br />

i<br />

= ˜ i =<br />

i<br />

PrCA i1 >RL i l i =<br />

i<br />

(<br />

1 − − )<br />

i − <br />

√ i<br />

2 + 2<br />

The expected annual number <strong>of</strong> minor accidents ˜ i caused by policyholder i in Portfolio C<br />

are displayed in Table 5.14.<br />

The total number <strong>of</strong> accidents M i caused by policyholder i is then equal to<br />

M i = M small<br />

i<br />

+ N large<br />

i<br />

since all the major accidents are reported to the company. The actual number <strong>of</strong> claims<br />

originating from the M i accidents is<br />

N i = N small<br />

i<br />

+ N large<br />

i <br />

Table 5.14 Expected annual claim frequency and corresponding expected annual<br />

accident frequency for the different risk classes in Portfolio C.<br />

<strong>Risk</strong> class <strong>Claim</strong> frequency Accident frequency<br />

18–24 + Rural + 0–2 0.3513403 03633521<br />

18–24 + Rural + 3–5 0.3301527 03517856<br />

18–24 + Rural + 6–10 0.3513403 03824574<br />

18–24 + Rural+>10 0.3513403 03777624<br />

18–24 + Urban + 0–2 0.4408723 04572087<br />

18–24 + Urban + 3–5 0.4142855 04435002<br />

18–24 + Urban + 6–10 0.4408723 04827648<br />

18 − 24 + Urban+>10 0.4408723 04765034<br />

>60 + Rural + 0–2 0.1476220 01659192<br />

>60 + Rural + 3–5 0.1387197 01683992<br />

>60 + Rural + 6–10 0.1476220 0188461<br />

>60 + Rural+>10 0.1476220 01831299<br />

>60 + Urban + 0–2 0.1852406 0209793<br />

>60 + Urban + 3–5 0.1740696 02137084<br />

>60 + Urban + 6–10 0.1852406 02396818<br />

>60 + Urban+>10 0.1852406 02326217<br />

25–60 + Rural + 0–2 0.1845933 01964037<br />

25–60 + Rural + 3–5 0.1734614 0193623<br />

25–60 + Rural + 6–10 0.1845933 02129273<br />

25–60 + Rural+>10 0.1845933 0208954<br />

25–60 + Urban + 0–2 0.2316332 02475869<br />

25–60 + Urban + 3–5 0.2176646 02447353<br />

25–60 + Urban + 6–10 0.2316332 02695796<br />

25–60 + Urban+>10 0.2316332 0264303


Efficiency and Bonus Hunger 251<br />

Moreover, the random variable N large<br />

i is independent from Mi<br />

small Ni small . Both M i and<br />

N i are mixed Poisson distributed. Specifically, given i = , M i ∼ oi˜ i + large<br />

i and<br />

N i ∼ oi i + large<br />

i . Note that the variable that has been analysed in the preceding chapters<br />

is N i , the number <strong>of</strong> accidents reported to the insurer, and not M i .<br />

5.4.3 Lemaire Algorithm for the Determination <strong>of</strong> Optimal Retention<br />

Limits<br />

In the preceding section, we explained how to correct the cost <strong>of</strong> claims to obtain the accident<br />

costs. This also allowed us to switch from claim frequencies to accident frequencies. To this<br />

end, we estimated the retention limits that were used by the policyholders on the basis <strong>of</strong> the<br />

costs <strong>of</strong> the claims they filed to the insurance company. The aim <strong>of</strong> this section is somewhat<br />

different. Having the distribution <strong>of</strong> the accident costs and <strong>of</strong> the accident frequencies, we<br />

would like to determine the optimal claiming strategy (which may differ from the observed<br />

claiming strategy inferred in the previous section).<br />

For each level <strong>of</strong> the scale, a critical claim size is determined: if the cost <strong>of</strong> the claim falls<br />

below this critical threshold then the rational policyholder should not report the accident to<br />

the company. Conversely if the cost exceeds this threshold, the rational policyholder should<br />

report the claim to the company. Note the close similarity with deductibles: under coherent<br />

behaviour, the bonus-malus scale is equivalent to a set <strong>of</strong> deductibles depending on the level<br />

occupied in the scale.<br />

Cost <strong>of</strong> Non-Reported Accidents<br />

Let rll be the optimal retention for a policyholder with expected annual accident<br />

frequency occupying level l in the bonus-malus scale. Here, rll is not a random<br />

variable, but an unknown constant to be determined.<br />

Assume that this policyholder has caused an accident with cost x at time t, 0≤ t


252 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The expected cost <strong>of</strong> a non-reported accident for a policyholder occupying level l is<br />

l = 1<br />

p l <br />

∫ rll<br />

y=0<br />

yfydy<br />

This policyholder will pay on average l × − l per period, because <strong>of</strong> the accidents<br />

not reported to the company. Let us assume that the accident occurrences are uniformly<br />

distributed over the year (so that on average they occur in the middle <strong>of</strong> the year). The<br />

average annual total cost borne by a policyholder in level l is then<br />

CTrll = b l + v 1 2 l − l <br />

where b l is the premium paid at the beginning <strong>of</strong> the year, subject to the bonus-malus scale.<br />

Let V l be the present value <strong>of</strong> all the payments made by a policyholder with annual<br />

expected claim frequency occupying level l. The V l s are obtained from<br />

∑<br />

V l = CTrll + v q l kV Tk l l = 0 1s (5.8)<br />

k=0<br />

If the policyholder reports all the accidents to the company, the system (5.8) coincides with<br />

(5.7). The system (5.8) admits a unique solution. For a given set <strong>of</strong> optimal retentions, the<br />

V l s give the cost <strong>of</strong> the strategy, according to the level occupied in the scale.<br />

Lemaire Algorithm<br />

Let us consider a policyholder in level l who just caused an accident with cost x at time t,<br />

0 ≤ t ≤ 1. There are two possibilities:<br />

(1) Either he does not claim for the accident and the expected present cost is<br />

v −t CTrll + x + v 1−t<br />

<br />

∑<br />

k=0<br />

q l<br />

(<br />

k1 − t<br />

)<br />

VTk+m<br />

l<br />

where m is the number <strong>of</strong> claims that the policyholder has already filed during the year.<br />

(2) Or he reports the accident to the company and the expected present cost is<br />

v −t CTrll + v 1−t<br />

<br />

∑<br />

k=0<br />

q l<br />

(<br />

k1 − t<br />

)<br />

VTk+m+1<br />

l<br />

The retention limit rll is the claim amount x for which the policyholder is indifferent<br />

between the two possibilities: the optimal retentions thus solve<br />

rll = v 1−t<br />

<br />

∑<br />

k=0<br />

( ) ( )<br />

q l k1 − t V Tk+m+1 l − V Tk+m l (5.9)<br />

for l = 0 1s. Note that (5.9) does not provide an explicit expression for the optimal<br />

retention since rll also appears in the q l k1 − ts.<br />

The optimal strategy is obtained using the following algorithm:


Efficiency and Bonus Hunger 253<br />

First Iteration<br />

Part A Starting from rl 0 l = 0 for l = 0s, the strategy consisting <strong>of</strong> reporting<br />

all the accidents to the insurer, (5.8) becomes<br />

∑<br />

V l = b l + v exp− k<br />

k! V T k l<br />

which gives the cost V 0 corresponding to the initial strategy.<br />

Part B<br />

k=0<br />

An improved strategy can then be obtained from (5.9) that reduces to<br />

rl 1 l = v 1−t<br />

l = 0 1s.<br />

<br />

∑<br />

k=0<br />

exp−1 − t<br />

1 −<br />

(<br />

)<br />

tk<br />

V<br />

k!<br />

Tk+m+1 l − V Tk+m l <br />

Second Iteration<br />

Part A Inserting the rl 1 l s in (5.8) gives the cost associated with this strategy. This<br />

cost will be smaller than the one associated with the initial strategy.<br />

Part B Inserting the new cost in the system (5.9), we find an improved strategy rl 2 l ,<br />

l = 0s.<br />

Subsequent Iterations<br />

The successive insertion <strong>of</strong> updated retentions and costs in the systems (5.8)–(5.9) produces<br />

a sequence <strong>of</strong> strategies, with reduced costs.<br />

In all the cases considered in Lemaire (1995), the sequence <strong>of</strong> the rl k<br />

l<br />

s converges to<br />

the optimal solution with miminum cost. The optimal retention limit is thus a function <strong>of</strong><br />

the level l occupied in the scale at the beginning <strong>of</strong> the insurance year, <strong>of</strong> the discount<br />

factor v, <strong>of</strong> the annual expected claim frequency , <strong>of</strong> the time t <strong>of</strong> occurrence <strong>of</strong><br />

the accident, and <strong>of</strong> the number m <strong>of</strong> claims previously reported to the company from<br />

the beginning <strong>of</strong> the insurance period. The optimal strategy is an increasing function<br />

<strong>of</strong> t: the optimal retention increases as one approaches the end <strong>of</strong> the year (and the<br />

premium discount if no accidents are reported). The influence <strong>of</strong> t on the optimal<br />

retention limit is much weaker than the level l, the discount factor v or . Putting<br />

t = 0 (and so m = 0) greatly simplifies the computation but leaves the retentions almost<br />

unchanged.<br />

The optimal retentions coming from the Lemaire algorithm should not be considered as<br />

being the real threshold above which policyholders report the accident to the insurance<br />

company. Indeed, this algorithm postulates a high degree <strong>of</strong> rationality behind individual<br />

behaviours. It is enough to have a look at insurance statistics to see that some claims<br />

concern accidents bearing a cost much lower than the optimal retention, which contradicts<br />

the assumptions behind the Lemaire algorithm. The output <strong>of</strong> the Lemaire algorithm should<br />

be better understood as a measure <strong>of</strong> toughness for a particular bonus-malus system. Note<br />

that Walhin & Paris (2000,2001) s<strong>of</strong>tened this requirement by assuming that there was a<br />

proportion <strong>of</strong> the policyholders complying with Lemaire claiming rule, and the remainder<br />

reporting all the accidents, whatever their cost.


254 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Note also that this approach requires knowledge <strong>of</strong> the uncensored distribution for claim<br />

costs and claim counts, which is usually not available in practice. The probability density<br />

function f corresponds to the cost <strong>of</strong> an accident (and not to the cost <strong>of</strong> a claim), and the<br />

distribution <strong>of</strong> the number <strong>of</strong> accidents is actually needed (not only the distribution <strong>of</strong> the<br />

number <strong>of</strong> claims that has been studied in the preceding chapters). Hence, the methodology<br />

described in Sections 5.4.1–5.4.2 has first to be applied to obtain the uncensored accident<br />

distribution.<br />

Application <strong>of</strong> the Lemaire Algorithm to Portfolio C<br />

The Lemaire algorithm can be applied with the distribution obtained for the cost <strong>of</strong> accidents<br />

(i.e. with the help <strong>of</strong> the LogNormal model with corrected regression coefficients) after<br />

having transformed the claim frequencies for Portfolio C into the accident frequencies.<br />

Let us consider the −1/+2 bonus-malus scale, with relativities 62.4 % for level 0, 130.2 %<br />

for level 1, 142.9 % for level 2, 207.7 % for level 3, 241.4 % for level 4, and 309.1 % for<br />

level 5. We consider here a discount rate <strong>of</strong> 4 %.<br />

Let us consider an individual aged between 25 and 60, living in a rural area, and driving a<br />

vehicle between 6 and 10 years old. His claim frequency is 18.46 %. His accident frequency<br />

is 21.29 %. The base pure premium is taken as the product <strong>of</strong> the claim frequency times the<br />

grand mean <strong>of</strong> all the claim sizes (large and moderate ones), that is, 01846 × E 181063.<br />

The optimal retentions are as follows:<br />

Level l<br />

Optimal retention<br />

0 E57475<br />

1 E105039<br />

2 E134116<br />

3 E176077<br />

4 E126069<br />

5 E69370<br />

We see that this policyholder should defray accidents with a cost up to E 1760.77 if he<br />

occupied level 3.<br />

Let us now consider an individual aged over 60, living in a urban area, and driving a<br />

vehicle between 3 and 5 years old. His claim frequency is 17.41 %. His accident frequency<br />

is 21.37 %. The base premium amounts to 01741 × E 181063. The optimal retentions are<br />

as follows:<br />

Level l<br />

Optimal retention<br />

0 E53989<br />

1 E98774<br />

2 E126284<br />

3 E166053<br />

4 E118990<br />

5 E65534<br />

The optimal retentions are now slightly smaller than before.


Efficiency and Bonus Hunger 255<br />

5.5 Further Reading and Bibliographic Notes<br />

5.5.1 <strong>Modelling</strong> <strong>Claim</strong> Amounts in Related Coverages<br />

In this chapter, only techniques for motor third party liability have been described. Besides<br />

the compulsory motor third party liability insurance, a number <strong>of</strong> related coverages are<br />

proposed to the drivers (like medical benefits, uninsured or underinsured motorist coverage,<br />

theft and collision and other than collision insurance). The problem caused by the late<br />

settlement <strong>of</strong> large claims generally disappears when optional coverages are considered. The<br />

annual claim amount S i produced by policyholder i is then represented as<br />

N<br />

∑ i<br />

S i =<br />

C ik<br />

k=1<br />

where N i is the number <strong>of</strong> claims, and the C ik s are the corresponding claim sizes. The S i s are<br />

assumed to be mutually independent, the C ik s to be independent and identically distributed<br />

for fixed i, and independent <strong>of</strong> N i .<br />

The analysis <strong>of</strong> the N i s usually starts with a Poisson regression model. Then, the<br />

residual heterogeneity is taken into account by the inclusion <strong>of</strong> a random effect. The<br />

impact <strong>of</strong> the deductibles has to be carefully assessed for optional coverages. In case<br />

deductibles are specified in the insurance policies, the actuary has to keep in mind that the<br />

statistical summaries relate to conditional distributions (given that the claim costs exceed the<br />

corresponding deductibles).<br />

According to the type <strong>of</strong> coverage, different models can be used for the C ik s. The claim<br />

size is usually expressed as a percentage <strong>of</strong> the sum insured. For collision insurance, the<br />

C ik s can be decomposed as<br />

C ik = ( J ik + 1 − J ik P ik<br />

)<br />

vi<br />

where v i is the sum insured for policy i (the value <strong>of</strong> the vehicle fixed according to the rules<br />

contained in the policy); J ik = 1ifthekth claim generates a total loss (that is, a loss that<br />

exhausts the sum insured, i.e. C ik = v i ), and 0 otherwise; and 0


256 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

distributions as the Normal and Gamma, the purely discrete scaled Poisson distribution, as<br />

well as the class <strong>of</strong> mixed compound Poisson distribution with Gamma summands. The<br />

name Tweedie has been associated with this family by Jorgensen (1987,1997) in honour<br />

<strong>of</strong> the pioneering works by Tweedie (1984).<br />

In nonlife ratemaking, the Tweedie model is very convenient for risk classification.<br />

However, it does not allow the actuary to isolate the frequency part <strong>of</strong> the pure premium,<br />

and thus does not provide the actuary with the input for the design <strong>of</strong> bonus-malus scales.<br />

This is why in this book (which is mainly devoted to motor insurance pricing) we favoured<br />

the separate analysis <strong>of</strong> claim costs and claim sizes. For other insurance products, where<br />

only the total amount <strong>of</strong> claims is available for actuarial analysis, the Tweedie distribution<br />

is an excellent candidate for loss modelling.<br />

In the actuarial literature, Jorgensen & Paes de Souza (1994) assumed Poisson arrival<br />

<strong>of</strong> claims and Gamma distributed costs for individual claims. These authors directly modelled<br />

the risk or expected cost <strong>of</strong> claims per insured unit using the Tweedie Generalized Linear<br />

Model. Smyth & Jorgensen (2002) observed that, when modelling the cost <strong>of</strong> insurance<br />

claims, it is generally necessary to model the dispersion <strong>of</strong> the costs as well as their mean.<br />

In order to model the dispersion, these authors used the framework <strong>of</strong> double generalized<br />

linear models. <strong>Modelling</strong> the dispersion increases the precision <strong>of</strong> the estimated tariffs. The<br />

use <strong>of</strong> double generalized linear models also allows the actuary to handle the case where<br />

only the total cost <strong>of</strong> claims and not the number <strong>of</strong> claims has been recorded.<br />

5.5.3 Large <strong>Claim</strong>s<br />

The analysis <strong>of</strong> large losses performed in this chapter is based on Cebrian, Denuit &<br />

Lambert (2003). Large losses are modelled using the Generalized Pareto distribution, and<br />

the main concern is to determine the threshold between small and large losses. An alternative<br />

has been developed by Buch-Kromann (2006) based on Buch-Larsen, Nielsen, Guillén<br />

& Bolanće (2005). This approach is based on a Champernowne distribution, corrected<br />

with a nonparametric estimator (that is obtained by transforming the data set with the<br />

estimated modified Champernowne distribution function and then estimating the density <strong>of</strong><br />

the transformed data set using the classical kernel density estimator). Based on the analysis <strong>of</strong><br />

a Danish data set, Buch-Kromann (2006) concluded that the Generalized Pareto approach<br />

performs better than the Champernowne one in terms <strong>of</strong> goodness-<strong>of</strong>-fit, whereas both<br />

methods are comparable in terms <strong>of</strong> predicting future claims.<br />

Another approach is proposed by Cooray & Ananda (2005) who combined a LogNormal<br />

probability density function together with a Pareto one. Specifically, these authors introduced<br />

a two-parameter smooth continuous composite LogNormal-Pareto model that is a twoparameter<br />

LogNormal density up to an unknown threshold value and a two-parameter Pareto<br />

density for the remainder. Continuity and differentiability are imposed at the unknown<br />

threshold to ensure that the resulting probability density function is smooth, reducing the<br />

number <strong>of</strong> parameters from four to two. The resulting two-parameter probability density<br />

function is similar in shape to the LogNormal density, yet its upper tail is thicker than the<br />

LogNormal density (and accomodates to the large losses observed in liability insurance).<br />

This approach clearly outperforms the one proposed in this chapter, in that all the parameters<br />

(including the threshold) are estimated in the same model. The approaches obtained with<br />

the methodology developed in this book can be used as starting values in the maximum


Efficiency and Bonus Hunger 257<br />

likelihood maximization. Note however that Cooray & Ananda (2005) did not consider the<br />

case where explanatory variables were available, so that their approach has to be extended<br />

to this more realistic situation.<br />

5.5.4 Alternative Approaches to <strong>Risk</strong> <strong>Classification</strong><br />

There are numerous techniques applied to the modelling <strong>of</strong> insurance losses. Early references<br />

in actuarial science include ter Berg (1980a,b). Various mathematical and statistical models<br />

for estimation <strong>of</strong> automobile insurance pricing are reviewed in Weisberg, Tomberlin &<br />

Chatterjee (1984). The methods are compared on their predictive ability based on two sets<br />

<strong>of</strong> automobile insurance data for two different states collected over two different periods.<br />

The issue <strong>of</strong> model complexity versus data availability is resolved through a comparison <strong>of</strong><br />

the accuracy <strong>of</strong> prediction. The models reviewed range from the use <strong>of</strong> simple cell means<br />

to various multiplicative-additive schemes to the empirical Bayes approach. The empirical<br />

Bayes approach, with prediction based on both model-based and individual cell estimates,<br />

seems to yield the best forecast. See also Jee (1989).<br />

Williams & Huang (1996) applied KDD (for Knowledge Discovery in Databases)<br />

techniques for insurance risk assessment. Daengdej, Lukose & Murison (1999) considered<br />

CBR (for Case-Based Reasoning) techniques for claim predictions.<br />

<strong>Classification</strong> techniques are also <strong>of</strong>ten used for risk classification. Retzlaff-Roberts &<br />

Puelz (1996) adopted an efficiency approach to the two-group linear programming method<br />

<strong>of</strong> discriminant analysis, using principles taken from data envelopment analysis, to predict<br />

group membership in an insurance underwriting scheme. Yeo ET AL. (2001) applied clustering<br />

techniques before modelling insurance losses.<br />

5.5.5 Efficiency<br />

The efficiency (or elasticity) <strong>of</strong> a bonus-malus system was first studied by Loimaranta<br />

(1972) and De Pril (1978). Note however that in these papers, the unknown individual risk<br />

factors are not viewed as random variables. These authors work in the fixed effects model<br />

that is close in spirit to the limited fluctuations credibility theory. Their efficiency concepts<br />

are therefore entirely different from the notion <strong>of</strong> efficiency proposed in Norberg (1976).<br />

Other efficiency measures have been proposed in the literature. For instance, Heras ET AL.<br />

(2002) evaluated the asymptotic fairness <strong>of</strong> bonus-malus systems (i.e., their ability to assess<br />

the individual risk in the long run) assuming the simplest case when there is no hunger for<br />

bonus.<br />

5.5.6 Optimal Retention Limits and Bonus Hunger<br />

The problem <strong>of</strong> determining the optimal claim size has been the topic <strong>of</strong> several papers.<br />

De Leve & Weeda (1968) considered a −1/top bonus-malus scale, so that the decision to<br />

file or not has to be made only if no claim has been made during the same period. Lemaire<br />

(1976,1977) studied the hunger for bonus and proposed a dynamic programming algorithm<br />

to determine the optimal claiming behaviour. De Pril (1979) considered that the claims<br />

were generated by a Poisson process and adapted a continuous-time approach. Specifically,


258 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

De Pril (1979) defined L n l k t as the amount that the actual accident must exceed in<br />

order to justify the filing <strong>of</strong> a claim, if the policyholder is at time t <strong>of</strong> period n in level l and<br />

has already filed k claims. The optimal value <strong>of</strong> L n l k t is determined to minimize the<br />

discounted expectation <strong>of</strong> the total future costs (premiums and self-defrayed accidents) for<br />

the policyholder. After De Pril (1979), De Pril & Goovaerts (1983) determined bounds<br />

for the optimal critical claim size when only incomplete information about the claim amount<br />

distribution is available. They considered −1/top bonus-malus scales.<br />

Dellaert ET AL. (1990) proved that under mild conditions the optimal decision rule is<br />

to claim for damages with amount above a certain limit. In some instances, policyholders<br />

are allowed to decide at the end <strong>of</strong> an insurance year which damages occurred during the<br />

year should be claimed; see, e.g., Martin-L<strong>of</strong> (1973). This means that the policyholder<br />

has perfect information about the number <strong>of</strong> accidents and the corresponding damages at<br />

the moment he/she decides which damages to claim. This situation has been investigated<br />

by Dellaert ET AL. (1991). Let us also mention that Dellaert ET AL. (1993) considered<br />

damage insurance (where, in addition to the bonus hunger phenomenon, the optimal stopping<br />

rule to terminate the insurance has to be determined).<br />

Holtan (2001) envisaged the loss <strong>of</strong> bonus after a claim as a rate <strong>of</strong> interest paid from<br />

the customer to the insurer, and studied the hunger for bonus from this viewpoint.<br />

Optimal claiming rules have also been considered in Operational Research, using Markov<br />

decision processes. When a driver is involved in a motor accident decisions have to be made<br />

as to whether or not a claim should be made. Hastings (1976) considered this problem as<br />

a Markov decision process with the expected cost over a finite horizon (where the relevant<br />

costs are repair costs and premium costs) as objective function. See also Haehling von<br />

Lanzenauer (1974) and Hey (1985). Norman & Shearn (1980) proposed the following<br />

simple rule <strong>of</strong> thumb that is shown to work well in their case: irrespective <strong>of</strong> when the<br />

accident occurs, claim only if the amount <strong>of</strong> the claim exceeds the difference over the next<br />

4 years between the total premiums payable if a claim is made and those payable if it is not,<br />

assuming that no further claims will be made. Chappell & Norman (1989) demonstrated<br />

that this simple rule was less efficient in the case <strong>of</strong> protected bonus. Kolderman &<br />

Volgenant (1985) examined the same problem under the assumption that only one claim<br />

is needed to change the insurance premium category, and any extra claims in the year have<br />

no effect on the current repair estimate for an accident.<br />

Walhin & Paris (2000) derived the actual claim amount and frequency distributions<br />

within a bonus-malus system. As explained above, policyholders should defray the small<br />

claims to avoid the penalties induced by the bonus-malus system. Consequently, there are<br />

more accidents than the number <strong>of</strong> claims filed by the insurer: the insurance data are censored.<br />

The kind <strong>of</strong> censorship is nevertheless very particular, and much more complicated than the<br />

phenomena encountered in classical nonlife problems (where losses are censored because<br />

they exceed some policy limit, or fall below a given deductible). The procedure described<br />

in Section 5.4.1 is taken from Denuit, Maréchal, Pitrebois & Walhin (2007a), where<br />

alternative approaches to obtain uncensored accident distributions can be found.<br />

Even if the bonus-hunger phenomenon has been extensively studied in connection with<br />

bonus-malus scales, the same idea applies to credibility systems. See, e.g., Norberg (1975)<br />

and Sundt (1988) for an illustration.


6<br />

Multi-Event Systems<br />

6.1 Introduction<br />

The majority <strong>of</strong> bonus-malus systems in force throughout the world penalize the number <strong>of</strong><br />

reported claims, without taking the cost <strong>of</strong> these claims into account. This can be considered<br />

as a shortcoming, since large claims should intuitively be more severely penalized. The first<br />

part <strong>of</strong> this chapter aims to develop credibility models that allow us to subdivide the claims<br />

into two categories, small and large losses (the extension to more than two categories is<br />

easy). Instead <strong>of</strong> determining a limiting amount to decide whether a loss should be qualified<br />

as large instead <strong>of</strong> small (such a criterion would lead to substantial practical problems, due<br />

to the time needed to evaluate the cost <strong>of</strong> the claim), we distinguish the accidents that caused<br />

property damage only from those that caused bodily injuries. Since the latter cost much more<br />

on average, this approach implicitly integrates the cost <strong>of</strong> the claim in a posteriori premium<br />

corrections. The Bayesian credibility approach turns out to lead to numerical integration. This<br />

is why we favour the linear credibility approach. Linear credibility formulas are developed<br />

to update the expected claim frequencies given past claim histories.<br />

The second part <strong>of</strong> this chapter is devoted to bonus-malus scales with several types <strong>of</strong><br />

events (assuming a Multinomial partitioning scheme). As mentioned above, all the classical<br />

bonus-malus systems are based on a single type <strong>of</strong> event: the occurrence <strong>of</strong> claims at fault,<br />

regardless <strong>of</strong> their severity or whether the policyholders are only partially liable for them.<br />

This over-simplification can be regarded as problematic for commercial purposes: it seems<br />

desirable to integrate the severity <strong>of</strong> the claims and to recognize the partial liability <strong>of</strong> the<br />

policyholder. For example, the bonus-malus system in force in France (studied in Chapter 9)<br />

entails a reduced penalty if the policyholder is only partially liable for the claim.<br />

Prominent examples <strong>of</strong> a posteriori ratemaking mechanisms based on several types <strong>of</strong><br />

events are provided by the experience rating systems in force in North America. These<br />

systems not only incorporate accidents at fault but also elements <strong>of</strong> the policyholders’ driving<br />

record. For instance, the Massachusetts safe driver insurance plan encourages safe driving<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


260 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

by rewarding drivers who do not cause an accident, or incur a traffic law violation. It is<br />

based on several types <strong>of</strong> events (major and minor at-fault accidents and traffic violations).<br />

Specifically, each policyholder is assigned a level between 9 and 35, based on his driving<br />

record during the previous six years. A new driver begins at level 15 (relativity <strong>of</strong> 100 %).<br />

Occupying any level below 15 entails a premium discount, while above level 15 the driver<br />

pays a surcharge. For each incident-free year <strong>of</strong> driving, the policyholder goes down one<br />

level. The driver will move up a certain number <strong>of</strong> levels based on the type <strong>of</strong> incident:<br />

two levels for a minor traffic violation, three levels for a minor at-fault accident, four levels<br />

for a major at-fault accident and five levels for a major traffic violation. The Massachusetts<br />

system ‘forgets’ all incidents after six years.<br />

This chapter addresses the actuarial modelling <strong>of</strong> such systems, with penalties depending<br />

on different types <strong>of</strong> events. As before, the modelling uses the concept <strong>of</strong> Markov Chains.<br />

We will see that under mild assumptions, the trajectory <strong>of</strong> each policyholder in the scale<br />

can be modelled with the aid <strong>of</strong> discrete-time Markov processes. The relativities associated<br />

with each level will then be computed using the maximum accuracy principle discussed in<br />

Chapter 4.<br />

6.2 Multi-Event <strong>Credibility</strong> Models<br />

6.2.1 Dichotomy<br />

Let Nit<br />

mat be the number <strong>of</strong> claims with material damage only, reported by policyholder i<br />

during period t. Similarly, let Nit<br />

bod be the number <strong>of</strong> claims with bodily injuries, and let<br />

N tot<br />

it<br />

= N mat<br />

it<br />

+ N bod<br />

it<br />

be the total number <strong>of</strong> claims. Policyholder i, i = 1n, is assumed to have been observed<br />

during T i periods. In the previous chapters, Nit<br />

tot was the variable <strong>of</strong> interest. Here, a dichotomy<br />

is operated, and we study Nit<br />

mat and Nit<br />

bod separately.<br />

6.2.2 Multivariate <strong>Claim</strong> Count Model<br />

Overdispersion and possible dependence between N mat<br />

it<br />

correlated random effects i<br />

mat<br />

i<br />

mat = i<br />

mat , we assume that<br />

where mat<br />

it<br />

and bod<br />

i<br />

N mat<br />

it<br />

such that E mat<br />

i<br />

∼ oi ( mat<br />

it<br />

= d it expscore mat<br />

it<br />

, and given i<br />

bod<br />

N bod<br />

it<br />

∼ oi ( bod<br />

it<br />

mat<br />

i<br />

and Nit<br />

bod<br />

= E bod<br />

)<br />

= i<br />

bod , we assume that<br />

)<br />

bod<br />

i<br />

are introduced via possibly<br />

= 1. Specifically, given<br />

where bod<br />

it<br />

= d it expscore bod<br />

it<br />

. Both scores are linear combinations <strong>of</strong> explanatory variables<br />

specific to policyholder i and year t (summarized in a vector x it ). Specifically,<br />

score mat<br />

it<br />

= mat<br />

p∑<br />

0<br />

+<br />

j=1<br />

mat<br />

j<br />

x itj<br />

i


Multi-Event Systems 261<br />

score bod<br />

it<br />

= bod<br />

p∑<br />

0<br />

+<br />

j=1<br />

bod<br />

j<br />

x itj <br />

We make the following assumptions about the dependence structure <strong>of</strong> the random variables:<br />

(i) Given bod<br />

i<br />

, the Nit<br />

bod s, t = 1 2T i , are independent.<br />

, the Nit<br />

mat s, t = 1 2T i , are independent.<br />

i<br />

bod , the sequences Nit<br />

bod t = 1 2T i and Nit<br />

mat t = 1 2T i <br />

are independent.<br />

(ii) Given mat<br />

i<br />

(iii) given mat<br />

i<br />

As explained in the previous chapter, overdispersion and serial correlation are induced by<br />

missing explanatory variables, whose effect is modelled with the help <strong>of</strong> the random effects<br />

and bod<br />

mat<br />

i<br />

i<br />

.<br />

6.2.3 Bayesian <strong>Credibility</strong> Approach<br />

The Bayesian approach requires numerical integration, which sometimes prevents the<br />

practical implementation <strong>of</strong> the resulting formulas (even if, nowadays, numerical integration<br />

has become straightforward with modern computers). It consists <strong>of</strong> deriving the conditional<br />

distribution <strong>of</strong> the random effects mat<br />

i<br />

and bod<br />

i<br />

conditional distribution then drives a posteriori premium corrections.<br />

Let us denote as<br />

∑<br />

T i<br />

k mat<br />

i•<br />

= k mat<br />

it<br />

t=1<br />

, given the past claims history. This<br />

the total number <strong>of</strong> claims with material damage only filed by policyholder i during the T i<br />

coverage periods, and as<br />

∑<br />

T i<br />

k bod<br />

i•<br />

= k bod<br />

it<br />

t=1<br />

the total number <strong>of</strong> claims with bodily injuries filed by this policyholder. The corresponding<br />

expected claim frequencies are<br />

The joint distribution <strong>of</strong> the N mat<br />

it<br />

PrN mat<br />

it<br />

∫ <br />

=<br />

0<br />

∑<br />

T i<br />

mat<br />

i•<br />

= mat<br />

it<br />

t=1<br />

and bod<br />

∑<br />

T i<br />

i•<br />

=<br />

t=1<br />

s and N bod s is given by<br />

= k mat<br />

it<br />

N bod<br />

it<br />

= k bod<br />

it<br />

for t = 1T i <br />

∫ <br />

0<br />

exp− mat<br />

i<br />

× f mat<br />

i<br />

mat<br />

i•<br />

− bod i<br />

it<br />

bod<br />

i<br />

d mat<br />

i<br />

d bod<br />

i<br />

bod<br />

i•<br />

mat i<br />

kmat i•<br />

bod<br />

it<br />

<br />

<br />

bod<br />

i<br />

kbod i•<br />

∏ Ti<br />

t=1 mat it<br />

kmat it<br />

∏ Ti<br />

t=1 kmat it !kit bod !<br />

bod<br />

it<br />

kbod it


262 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

where f · · is the bivariate probability density function <strong>of</strong> the couple <strong>of</strong> random effects<br />

i<br />

mat i<br />

bod . The joint probability density function <strong>of</strong> i<br />

mat , i<br />

bod , Nit<br />

mat , Nit<br />

bod , t = 1T i<br />

is given by<br />

exp− mat<br />

i<br />

mat<br />

i•<br />

− bod i<br />

bod<br />

i•<br />

mat i<br />

kmat i•<br />

∏ Ti<br />

<br />

bod<br />

i<br />

kbod i•<br />

so that the conditional probability density function <strong>of</strong> mat<br />

i<br />

∫ <br />

0<br />

exp− mat<br />

∫ <br />

i<br />

mat<br />

i•<br />

− bod i<br />

bod<br />

i•<br />

mat i<br />

kmat i• <br />

bod<br />

i<br />

exp− 0 1 mat<br />

i•<br />

The posterior distribution <strong>of</strong> i<br />

mat<br />

in Chapter 3.<br />

− 2 bod<br />

i•<br />

i<br />

bod<br />

t=1 mat it<br />

kmat it bod<br />

it<br />

kbod it<br />

∏ Ti<br />

t=1 kmat it !kit bod !<br />

i<br />

bod<br />

f mat<br />

i<br />

bod<br />

i<br />

given past claims history is<br />

kbod i• f i<br />

mat i<br />

bod <br />

(6.1)<br />

1 kmat i• 2 kbod i• f 1 2 d 1 d 2<br />

then allows for a posteriori corrections, as explained<br />

<br />

6.2.4 Summary <strong>of</strong> Past <strong>Claim</strong>s Histories<br />

Denote as<br />

∑<br />

T i<br />

N bod<br />

i•<br />

= N bod<br />

it<br />

t=1<br />

and N mat<br />

∑<br />

T i<br />

i•<br />

= N mat<br />

it<br />

t=1<br />

the total claim numbers <strong>of</strong> each category caused by policyholder i during the observation<br />

period. Since the random effects i<br />

mat and i<br />

bod do not vary with time, Ni•<br />

mat and Ni•<br />

bod are<br />

sufficient summaries <strong>of</strong> past claims histories (in the sense that the posterior distributions <strong>of</strong><br />

i<br />

mat and i<br />

bod as well as the predictive distributions <strong>of</strong> NiT mat<br />

bod<br />

i +1<br />

and NiT i +1<br />

only depend on<br />

Ni•<br />

mat and Ni•<br />

bod ; see (6.1) where past claims histories enter through kmat<br />

i•<br />

and ki• bod).<br />

Clearly,<br />

mat<br />

i•<br />

It is then easy to see that given mat<br />

i<br />

and that given bod<br />

i<br />

= bod<br />

i<br />

= EN mat and bod<br />

i•<br />

= mat<br />

i<br />

N mat<br />

i•<br />

N bod<br />

i•<br />

i•<br />

∼ oi ( mat<br />

i•<br />

∼ oi ( bod<br />

i•<br />

= EN bod<br />

i•<br />

<br />

mat i<br />

bod i<br />

)<br />

)<br />

<br />

invoking the conditional independence <strong>of</strong> the annual claim numbers <strong>of</strong> each category and the<br />

stability <strong>of</strong> the Poisson family under convolution. Therefore, Ni•<br />

mat and Ni•<br />

bod are both mixed<br />

Poisson distributed.


Multi-Event Systems 263<br />

6.2.5 Variance-Covariance Structure <strong>of</strong> the Random Effects<br />

For deriving linear credibility formulas, we only need the moment structure <strong>of</strong> the risk<br />

variables. Let us introduce the variance-covariance matrix <strong>of</strong> i<br />

bod i<br />

mat that is denoted as<br />

( )<br />

<br />

2<br />

= bod<br />

bm<br />

bm mat<br />

2 <br />

In words, 2 bod and 2 mat<br />

are the variances <strong>of</strong> bod<br />

i<br />

covariance between bod<br />

i<br />

and mat<br />

i<br />

and i<br />

mat<br />

. Note that the following inequalities<br />

, respectively, and bm is the<br />

2 bod ≥ 0 2 mat ≥ 0 and bm≤ bod mat<br />

must be fulfilled to ensure that is positive definite. The estimated variances and covariance<br />

have to fulfill the same constraints. If not, this rules out the linear credibility model.<br />

6.2.6 Variance-Covariance Structure <strong>of</strong> the Annual <strong>Claim</strong> Numbers<br />

Let us now compute the variance and covariance <strong>of</strong> Ni• mat and Ni• bod.<br />

Since<br />

Ni•<br />

mat ∼ oi mat<br />

i•<br />

mat i<br />

, we have<br />

Similarly, from N bod<br />

i•<br />

VN mat<br />

i•<br />

∼ oi bod<br />

i•<br />

bod i<br />

VN bod<br />

i•<br />

= mat<br />

i•<br />

we get<br />

= bod<br />

i•<br />

+ ( ) 2<br />

mat 2<br />

i• mat <br />

+<br />

( ) 2<br />

bod 2<br />

i• bod <br />

To have an idea about the dependence existing between the numbers <strong>of</strong> claims with material<br />

damage only and with bodily injuries, let us now compute the covariance between Ni•<br />

mat<br />

and Ni• bod:<br />

[<br />

]<br />

CN mat<br />

i•<br />

Nbod i• = E N mat<br />

i•<br />

− mat bod<br />

i•<br />

Ni•<br />

− bod<br />

i• <br />

[<br />

= E E [ N mat<br />

i•<br />

− mat<br />

i•<br />

[<br />

= E E [ ∣<br />

N mat<br />

i•<br />

− mat<br />

i•<br />

[<br />

= E<br />

= mat<br />

i•<br />

mat<br />

i•<br />

(<br />

<br />

mat<br />

i<br />

bod i•<br />

bm<br />

As expected, the covariance bm between mat<br />

i<br />

bod<br />

Ni•<br />

∣ mat<br />

i<br />

− bod<br />

i•<br />

∣ ∣ mat<br />

] [ ∣<br />

E N<br />

bod<br />

− bod<br />

i•<br />

− 1 ) (<br />

bod<br />

i• <br />

bod<br />

i<br />

− 1 )]<br />

and bod<br />

i<br />

i<br />

i•<br />

bod<br />

i<br />

∣ bod<br />

i<br />

] ]<br />

] ]<br />

drives the covariance <strong>of</strong> Ni•<br />

mat and Ni• bod.


264 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

6.2.7 Estimation <strong>of</strong> the Variances and Covariances<br />

The formulas derived above for VNi•<br />

mat bod<br />

mat<br />

, VNi•<br />

and CNi•<br />

estimates for the parameters mat 2 , 2 bod and bm:<br />

∑<br />

(<br />

n<br />

(k ) )<br />

mat mat 2<br />

̂ mat 2 i=1 i•<br />

− ̂ i• − k<br />

mat<br />

i•<br />

=<br />

̂ 2 bod = ∑ n<br />

i=1<br />

̂ bm =<br />

∑ n<br />

i=1<br />

∑ n<br />

i=1 (̂mat i•<br />

( (k<br />

bod<br />

i•<br />

(<br />

) 2<br />

) )<br />

bod 2<br />

− ̂ − k<br />

bod<br />

i•<br />

∑ n<br />

i=1 (̂bod i•<br />

ki•<br />

mat<br />

∑ n<br />

i=1<br />

) 2<br />

)(<br />

mat<br />

− ̂<br />

i•<br />

̂<br />

mat<br />

i•<br />

k bod<br />

i•<br />

bod ̂ i•<br />

i•<br />

Nbod i•<br />

)<br />

bod<br />

− ̂<br />

i•<br />

suggest the following<br />

that are consistent in the random effects model.<br />

The parameters mat 2 , 2 bod and bm have been estimated above on aggregate data, giving<br />

the estimators ̂ mat, 2 ̂ bod 2 and ̂ bm. Alternatively, these parameters could be estimated from<br />

individual data as follows:<br />

∑ n ∑<br />

(<br />

Ti<br />

(k ) )<br />

mat mat 2<br />

i=1 t=1 it<br />

− ̂ it − k<br />

mat<br />

it<br />

˜ 2 mat =<br />

∑ n ∑ Ti<br />

˜<br />

bod 2 = i=1 t=1<br />

˜ bm =<br />

∑ n<br />

i=1<br />

∑ Ti<br />

t=1<br />

∑ n<br />

i=1<br />

∑ Ti<br />

t=1<br />

( (k<br />

bod<br />

it<br />

−<br />

∑ n<br />

i=1<br />

∑ Ti<br />

t=1<br />

(<br />

k<br />

mat<br />

it<br />

−<br />

∑ n<br />

i=1<br />

∑ Ti<br />

t=1<br />

(̂mat it<br />

̂<br />

bod<br />

it<br />

(̂bod it<br />

̂<br />

mat<br />

it<br />

̂<br />

mat<br />

it<br />

) 2<br />

) )<br />

2<br />

− k<br />

bod<br />

) 2<br />

it<br />

)(<br />

k<br />

bod<br />

−<br />

it<br />

̂ bod<br />

it<br />

)<br />

bod ̂ it<br />

The estimators ̂ 2 mat, ̂ 2 bod and ̂ bm are preferred over ˜ 2 mat, ˜ 2 bod and ˜ bm, respectively, since<br />

the variances <strong>of</strong> the former are smaller. As shown by Pinquet ET AL. (2001), the condition<br />

0 < ̂ 2 mat < ˜ 2 mat is necessary for the introduction <strong>of</strong> dynamic random effects.<br />

<br />

6.2.8 Linear <strong>Credibility</strong> Premiums<br />

Denote as<br />

mat<br />

iT i +1 = d iT i +1 expscore mat<br />

iT i +1 and as bod<br />

iT i +1 = d iT i +1 expscore bod<br />

iT i +1 <br />

the expected claim frequencies for policyholder i in period T i + 1. The best linear predictor<br />

∑<br />

T i<br />

c mat<br />

i0<br />

+<br />

t=1<br />

c mat/mat<br />

it<br />

N mat<br />

∑<br />

T i<br />

it<br />

+<br />

t=1<br />

c bod/mat<br />

it<br />

N bod<br />

it


Multi-Event Systems 265<br />

<strong>of</strong> the true expected claim frequency mat<br />

iT i +1 mat i<br />

minimizes<br />

⎡(<br />

mat = E ⎣<br />

mat<br />

iT i +1 mat i<br />

Similarly, the best linear predictor<br />

<strong>of</strong> bod<br />

iT i +1 bod i<br />

minimizes<br />

⎡(<br />

bod = E ⎣<br />

∑<br />

T i<br />

c bod<br />

i0<br />

+<br />

t=1<br />

bod<br />

iT i +1 bod i<br />

− c mat<br />

i0<br />

T<br />

∑ i<br />

−<br />

t=1<br />

c mat/bod<br />

it<br />

N mat<br />

− c bod<br />

i0<br />

c mat/mat<br />

it<br />

∑<br />

T i<br />

it<br />

+<br />

t=1<br />

T<br />

∑ i<br />

−<br />

t=1<br />

c mat/bod<br />

it<br />

N mat<br />

it<br />

T<br />

∑ i<br />

−<br />

t=1<br />

c bod/bod<br />

it<br />

N bod<br />

it<br />

N mat<br />

it<br />

T<br />

∑ i<br />

−<br />

t=1<br />

c bod/mat<br />

it<br />

N bod<br />

it<br />

c bod/bod<br />

it<br />

N bod<br />

it<br />

) ⎤ 2<br />

⎦<br />

) ⎤ 2<br />

⎦<br />

The optima are obtained by setting to zero the derivatives <strong>of</strong> mat with respect to ci0<br />

mat<br />

to cis<br />

mat/mat , that is,<br />

c mat<br />

i0<br />

c mat/mat<br />

is<br />

T i<br />

= mat<br />

iT i +1 − ∑<br />

= mat<br />

t=1<br />

c mat/mat<br />

it<br />

mat<br />

it<br />

T<br />

∑ i<br />

−<br />

t=1<br />

T<br />

iT i +1 2 mat − ∑ i<br />

2 mat<br />

c mat/mat<br />

it<br />

mat<br />

it<br />

t=1<br />

c bod/mat<br />

it<br />

bod<br />

it<br />

T<br />

∑ i<br />

− bm<br />

t=1<br />

c bod/mat<br />

it<br />

bod<br />

it<br />

The last relation shows that cis<br />

mat/mat does not depend on s. Similarly, one can check that<br />

cis<br />

mat/bod , cis<br />

bod/mat and cis<br />

bod/bod do not depend on s. This justifies the approach based on aggregate<br />

data Ni•<br />

mat and Ni•<br />

bod (that are an exhaustive summary <strong>of</strong> past claims histories in the credibility<br />

model with static random effects).<br />

Denoting as ci<br />

mat/mat (ci<br />

bod/mat , ci<br />

mat/bod and ci<br />

bod/bod , respectively) the common values <strong>of</strong> the<br />

cis<br />

mat/mat (cis<br />

bod/mat , cis<br />

mat/bod and cis<br />

bod/bod , respectively), the best linear predictors are thus <strong>of</strong> the form<br />

for mat<br />

for bod<br />

follows:<br />

iT i +1 mat i<br />

iT i +1 bod i<br />

, and<br />

c mat<br />

i<br />

c bod<br />

i<br />

+ c mat/mat<br />

i<br />

+ c mat/bod<br />

i<br />

N mat<br />

i•<br />

N mat<br />

i•<br />

+ c bod/mat<br />

i<br />

N bod<br />

i•<br />

+ c bod/bod<br />

i<br />

N bod<br />

i•<br />

. The meaning <strong>of</strong> the coefficients involved in these linear predictors is as<br />

<br />

and<br />

c mat/mat<br />

i<br />

c bod/mat<br />

i<br />

evaluates the information contained in past material claims on the occurrence <strong>of</strong><br />

future material claims;<br />

evaluates the information contained in past claims with bodily injuries on the<br />

occurrence <strong>of</strong> future material claims;


266 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

c mat/bod<br />

i<br />

c bod/bod<br />

i<br />

evaluates the information contained in past material claims on the occurrence <strong>of</strong><br />

future claims with bodily injuries;<br />

evaluates the information contained in past claims with bodily injuries on the<br />

occurrence <strong>of</strong> future claims with bodily injuries.<br />

The values <strong>of</strong> these coefficients are determined by minimizing simultaneously mat and bod<br />

that may be rewritten as<br />

and as<br />

mat = E [( mat<br />

bod = E [( bod<br />

iT i +1 mat i<br />

iT i +1 bod i<br />

− c mat<br />

i<br />

− c bod<br />

i<br />

− c mat/mat<br />

i<br />

− c mat/bod<br />

i<br />

N mat<br />

i•<br />

N mat<br />

i•<br />

− c bod/mat<br />

i<br />

− c bod/bod<br />

i<br />

N bod<br />

i•<br />

N bod<br />

i•<br />

) 2 ]<br />

) 2 ]<br />

<br />

Setting to zero the partial derivatives <strong>of</strong> mat and bod with respect to the six parameters gives:<br />

c mat<br />

i<br />

c bod<br />

i<br />

= mat<br />

= bod<br />

0 = mat<br />

iT i +1 − cmat/mat i<br />

mat<br />

i•<br />

iT i +1 − cmat/bod i<br />

mat<br />

i•<br />

iT i +1 Emat i<br />

N mat<br />

i•<br />

− c bod/mat<br />

i<br />

0 = mat<br />

EN mat<br />

i•<br />

N bod<br />

iT i +1 Emat i<br />

N bod<br />

i•<br />

− cbod/mat i<br />

bod<br />

i•<br />

(6.2)<br />

− cbod/bod i<br />

bod<br />

i•<br />

(6.3)<br />

− cmat<br />

i<br />

mat<br />

i•<br />

− cmat/mat i<br />

E [ N mat<br />

i• 2]<br />

i• (6.4)<br />

− cmat<br />

i<br />

bod<br />

i•<br />

− cmat/mat i<br />

EN mat<br />

i•<br />

N bod<br />

i•<br />

− c bod/mat<br />

i<br />

E [ N bod<br />

i• 2] (6.5)<br />

0 = bod<br />

iT i +1 Ebod i<br />

N mat<br />

i•<br />

− c bod/bod<br />

i<br />

0 = bod<br />

EN mat<br />

i•<br />

N bod<br />

iT i +1 Ebod i<br />

N bod<br />

i•<br />

− cbod<br />

i<br />

mat<br />

i•<br />

− cmat/bod i<br />

E [ N mat<br />

i• 2]<br />

i• (6.6)<br />

− cbod<br />

i<br />

bod<br />

i•<br />

− cmat/bod i<br />

EN mat<br />

i•<br />

N bod<br />

i•<br />

− c bod/bod<br />

i<br />

E [ N bod<br />

i• 2] (6.7)<br />

The expectancies involved in this system are given by<br />

E mat<br />

i<br />

E bod<br />

i<br />

E mat<br />

i<br />

E bod<br />

i<br />

N mat<br />

i•<br />

N bod<br />

i•<br />

N bod<br />

i•<br />

N mat<br />

i•<br />

= mat<br />

i•<br />

= bod<br />

i•<br />

2 mat + mat i•<br />

2 bod + bod i•<br />

= bod<br />

bm + bod<br />

i•<br />

i•<br />

= mat<br />

bm + mat<br />

i•<br />

i•


Multi-Event Systems 267<br />

E [ N mat<br />

i•<br />

E [ N bod<br />

i•<br />

EN mat<br />

i•<br />

N bod<br />

i•<br />

2] = mat<br />

i•<br />

2] = bod<br />

i•<br />

= mat<br />

i•<br />

+ ( ) 2<br />

mat 2<br />

i• mat + 1<br />

+<br />

( ) 2<br />

bod 2<br />

i• bod + 1<br />

bod i•<br />

Inserting these expressions in (6.4)–(6.7), we get<br />

and finally<br />

mat<br />

iT i +1 2 mat = cmat/mat i<br />

mat<br />

iT i +1 bm = c mat/mat<br />

i<br />

bod<br />

bm + 1<br />

1 + mat 2 mat + cbod/mat<br />

mat<br />

i•<br />

i•<br />

bm + c bod/mat 1 + bod<br />

iT i +1 bm = c mat/bod<br />

i<br />

1 + mat<br />

i•<br />

2 mat + cbod/bod i<br />

bod<br />

iT i +1 2 bod = cmat/bod i<br />

mat<br />

i•<br />

c bod/mat<br />

i<br />

=<br />

c mat/bod<br />

i<br />

=<br />

c mat/mat<br />

i<br />

c bod/bod<br />

i<br />

1 + mat<br />

i•<br />

1 + mat<br />

i•<br />

= mat<br />

iT i +1<br />

= bod<br />

iT i +1<br />

i<br />

i<br />

bod<br />

i•<br />

bm<br />

i•<br />

bm + c bod/bod 1 + bod<br />

mat<br />

iT i +1 bm<br />

bod 2 − mat<br />

2 mat1 + bod<br />

i•<br />

i<br />

i•<br />

bod<br />

iT i +1 bm<br />

bod 2 − mat<br />

2 mat1 + bod<br />

i•<br />

i•<br />

2 bod <br />

bod<br />

i•<br />

bm<br />

i•<br />

2 bod <br />

bod<br />

i• bm<br />

2<br />

bod<br />

i• bm<br />

2<br />

mat 2 + bod i•<br />

2 mat 2 bod − 2 bm <br />

1 + mat<br />

i• mat1 2 + bod<br />

i• bod 2 − mat i•<br />

bod 2 + mat i•<br />

2 mat 2 bod − 2 bm <br />

1 + mat<br />

i• mat1 2 + bod<br />

i• bod 2 − mat i•<br />

bod<br />

i• bm<br />

2<br />

bod<br />

i• bm<br />

2<br />

We see that ci<br />

bod/mat and ci mat/bod are increasing with bm , and decreasing with mat 2 and<br />

bod 2 . This is intuitively acceptable: the more the random effects are correlated, the more<br />

information is contained in the claims with material damage only about the claims with<br />

bodily injuries, and vice-versa. Inserting these solutions in the equations (6.2)–(6.3) gives<br />

mat<br />

iT i +1<br />

c mat<br />

i<br />

c bod<br />

i<br />

= mat<br />

iT i +1<br />

= bod<br />

iT i +1<br />

( )<br />

1 + bod<br />

i• <br />

2<br />

bod<br />

− bm<br />

1 + mat<br />

i• mat1 2 + bod<br />

i• bod 2 − mat i•<br />

( )<br />

1 + mat<br />

i• <br />

2<br />

mat<br />

− bm<br />

1 + mat<br />

i• mat1 2 + bod<br />

i• bod 2 − mat i•<br />

bod<br />

i• bm<br />

2<br />

bod<br />

i• bm<br />

2<br />

To sum up, the best linear predictor for the expected frequency <strong>of</strong> claims with material<br />

damage only occurring in year T i + 1is<br />

( )<br />

1 + bod<br />

i• <br />

2<br />

bod<br />

− bm + bm Ni•<br />

bod + ( mat 2 + bod i•<br />

2 mat 2 bod − 2 bm ) Ni•<br />

mat<br />

1 + mat<br />

i•<br />

mat1 2 + bod bod 2 − mat<br />

i•<br />

i•<br />

bod<br />

i• bm<br />

2


268 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

and the best linear predictor for the expected frequency <strong>of</strong> claims with bodily injuries in<br />

year T i + 1is<br />

bod<br />

iT i +1<br />

1 + mat<br />

i•<br />

( )<br />

<br />

2<br />

mat<br />

− bm + bm Ni•<br />

mat + ( bod 2 + mat i•<br />

2 mat 2 bod − 2 bm ) Ni•<br />

bod<br />

1 + mat<br />

i• mat1 2 + bod<br />

i• bod 2 − <br />

mat i• bod<br />

i•<br />

2 bm<br />

6.2.9 Numerical Illustration for Portfolio A<br />

A Priori Ratemaking<br />

The observed annual frequency for claims with bodily injuries is 16 %. The observed annual<br />

frequency for claims with material damage only is 130%.<br />

In this portfolio, the two types <strong>of</strong> claims we consider are positively correlated. This can<br />

be seen from Table 6.1, where the conditional expectation <strong>of</strong> the number <strong>of</strong> claims <strong>of</strong> one<br />

type is computed given the number <strong>of</strong> claims <strong>of</strong> the other type. The more claims <strong>of</strong> one type<br />

reported, the higher this conditional expectation, resulting in positive dependence.<br />

A Posteriori Corrections<br />

Let us now update the claim frequencies with the help <strong>of</strong> the formulas obtained with linear<br />

credibility. To this end, we first estimate . This gives<br />

Let us now consider two types <strong>of</strong> drivers:<br />

̂ 2 mat = 08458<br />

̂<br />

bod 2 = 11188<br />

̂ bm = 06255<br />

• a good driver with mat<br />

it<br />

= 0083 and bod<br />

it<br />

= 0010, and<br />

• a bad driver with mat<br />

it<br />

= 0246 and bod<br />

it<br />

= 0030.<br />

Table 6.1 Observed annual frequency <strong>of</strong> claims with bodily injuries, given the number<br />

<strong>of</strong> claims with material damage only; and observed annual frequency <strong>of</strong> claims with<br />

material damage only, given the number <strong>of</strong> claims with bodily injuries; for portfolio A.<br />

Conditional expectation <strong>of</strong> the<br />

number <strong>of</strong> claims with bodily<br />

injuries<br />

Conditional expectation <strong>of</strong> the<br />

number <strong>of</strong> claims with<br />

material damage only<br />

Given the number <strong>of</strong> claims<br />

with material damage only<br />

Given the number <strong>of</strong> claims<br />

with bodily injuries<br />

= 0 is 1.5 % = 0 is 12.9 %<br />

= 1 is 2.7 % = 1 is 22.8 %<br />

= 2 is 3.6 % = 2 is 37.7 %


Multi-Event Systems 269<br />

Table 6.2 Evolution <strong>of</strong> relativities and pure premiums (taking the average cost <strong>of</strong> a claim with<br />

material damage only (mat) as the monetary unit, and assuming that claims with bodily injuries (bod)<br />

are on average ten times more expensive) if no claim has been reported.<br />

Time Good driver Bad driver<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

1 92.9 % 94.2 % 16.8 % 81.5 % 84.6 % 45.1 %<br />

2 86.7 % 89.1 % 15.8 % 68.7 % 73.9 % 38.8 %<br />

3 81.3 % 84.6 % 15.0 % 59.3 % 66.0 % 34.1 %<br />

4 76.6 % 80.6 % 14.2 % 52.1 % 59.9 % 30.6 %<br />

5 72.3 % 77.1 % 13.5 % 46.5 % 55.1 % 27.7 %<br />

6 68.5 % 73.9 % 12.8 % 41.9 % 51.1 % 25.4 %<br />

7 65.0 % 71.0 % 12.3 % 38.2 % 47.8 % 23.5 %<br />

8 61.9 % 68.4 % 11.8 % 35.0 % 45.0 % 21.9 %<br />

9 59.1 % 66.0 % 11.3 % 32.3 % 42.5 % 20.5 %<br />

10 56.5 % 63.8 % 10.9 % 30.0 % 40.4 % 19.3 %<br />

Table 6.2 displays the results for the case where no claim is reported for 10 years. The<br />

first column gives the coefficient to be applied on mat<br />

it<br />

, the second column gives the<br />

coefficient to be applied on bod<br />

it<br />

and the third column gives the premium to be charged<br />

if the average cost <strong>of</strong> a material damage claim is 1 and the average cost <strong>of</strong> a bodily<br />

injury claim is 10 (the monetary unit is thus the average cost <strong>of</strong> a claim with material<br />

damage only, and claims with bodily injuries are assumed to be on average ten times<br />

more expensive than claims with material damage only). The first three columns are for<br />

the good driver and the next three are for the bad driver. We see that the correction<br />

coefficients are always smaller for the claims with material damage only than for the claims<br />

with bodily injuries. This is due to the fact that the former claims occur more frequently<br />

than the latter ones, so that not reporting any claim with material damage only entails<br />

more premium discount. As explained previously, the discounts are always larger for a<br />

bad driver than for a good one. However, the premiums always stay higher for the bad<br />

drivers.<br />

Table 6.3 considers the case where the policyholder reported a single claim with<br />

material damage only, for 10 years. Finally, Table 6.4 considers the case where the<br />

policyholder reported a single claim with bodily injuries, for 10 years. Comparing these<br />

two tables, we see that the premium amount is larger if a claim with bodily injuries has<br />

been reported, compared to the case where a claim with material damage only has been<br />

reported. The cost <strong>of</strong> the claim is thus taken into account in the premium correction.<br />

The correction coefficients are always larger for the good driver than for the bad one, as<br />

explained previously. It is also interesting to note that reporting a claim <strong>of</strong> one type always<br />

increases the probability <strong>of</strong> reporting a claim <strong>of</strong> the other type. There is thus a double<br />

effect when updating the premium: the frequency <strong>of</strong> claims <strong>of</strong> the same type as the one<br />

that has been reported is increased, but the frequency <strong>of</strong> claims <strong>of</strong> the other type gets<br />

inflated, too.


270 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 6.3 Evolution <strong>of</strong> relativities and pure premiums (taking the average cost <strong>of</strong> a claim with<br />

material damage only (mat) as the monetary unit, and assuming that claims with bodily injuries (bod)<br />

are on average ten times more expensive) if a single claim with material damage only has been reported<br />

during the first year.<br />

Time Good driver Bad driver<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

1 171.6 % 152.0 % 29.0 % 150.7 % 134.9 % 77.0 %<br />

2 160.3 % 142.8 % 27.2 % 127.3 % 115.7 % 65.6 %<br />

3 150.4 % 134.7 % 25.6 % 110.1 % 101.6 % 57.2 %<br />

4 141.7 % 127.6 % 24.1 % 97.0 % 90.7 % 50.8 %<br />

5 133.9 % 121.2 % 22.9 % 86.7 % 82.2 % 45.7 %<br />

6 126.9 % 115.5 % 21.7 % 78.3 % 75.2 % 41.6 %<br />

7 120.6 % 110.3 % 20.7 % 71.4 % 69.4 % 38.1 %<br />

8 114.9 % 105.7 % 19.8 % 65.6 % 64.6 % 35.3 %<br />

9 109.7 % 101.4 % 18.9 % 60.7 % 60.4 % 32.8 %<br />

10 105.0 % 97.5 % 18.2 % 56.5 % 56.8 % 30.7 %<br />

Table 6.4 Evolution <strong>of</strong> relativities and pure premiums (taking the average cost <strong>of</strong> a claim with<br />

material damage only (mat) as the monetary unit, and assuming that claims with bodily injuries (bod)<br />

are on average ten times more expensive) if a single claim with bodily injuries has been reported<br />

during the first year.<br />

Time Good driver Bad driver<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

Relativity<br />

mat<br />

Relativity<br />

bod<br />

Pure<br />

premium<br />

1 150.7 % 201.9 % 32.1 % 131.7 % 185.5 % 87.3 %<br />

2 140.5 % 193.1 % 30.4 % 110.4 % 166.8 % 76.6 %<br />

3 131.5 % 185.4 % 28.9 % 94.8 % 152.9 % 68.6 %<br />

4 123.5 % 178.5 % 27.6 % 82.9 % 142.0 % 62.4 %<br />

5 116.5 % 172.3 % 26.4 % 73.6 % 133.2 % 57.5 %<br />

6 101.1 % 166.7 % 25.3 % 66.0 % 125.8 % 53.5 %<br />

7 104.4 % 161.7 % 24.3 % 59.8 % 119.6 % 50.1 %<br />

8 99.2 % 157.0 % 23.5 % 54.6 % 114.3 % 47.3 %<br />

9 94.5 % 152.8 % 22.7 % 50.2 % 109.6 % 44.8 %<br />

10 90.2 % 148.9 % 21.9 % 46.4 % 105.5 % 42.6 %<br />

6.3 Multi-Event Bonus-Malus Scales<br />

6.3.1 Types <strong>of</strong> <strong>Claim</strong>s<br />

Here we adopt the same assumptions as in Chapter 4. Let us pick a policyholder at random<br />

from the portfolio and let us denote as N the number <strong>of</strong> claims reported during the year.<br />

Furthermore, let be the (unknown) a priori expected claim frequency, with Pr = k = w k .


Multi-Event Systems 271<br />

Denoting as the (unknown) accident proneness <strong>of</strong> this policyholder, the conditional<br />

probability mass function <strong>of</strong> N is given by<br />

PrN = j = = k = exp− k − k j<br />

j = 0 1 2<br />

j!<br />

The risk pr<strong>of</strong>ile <strong>of</strong> the portfolio is described by the distribution function F <strong>of</strong> and we<br />

assume that E = 1. Since represents the residual effect <strong>of</strong> unobserved characteristics, it<br />

seems reasonable to assume that and are mutually independent. Hence, the unconditional<br />

probability mass function <strong>of</strong> N is given by<br />

PrN = j = ∑ k<br />

∫ +<br />

w k PrN = j = = k dF <br />

0<br />

j = 0 1<br />

We distinguish among m different types <strong>of</strong> claim reported by the policyholder. Each type<br />

<strong>of</strong> claim induces a specific penalty for the policyholder. For instance, one could think <strong>of</strong><br />

• claims with bodily injuries and claims with material damage only (m = 2)<br />

• claims with partial liability and claims with full liability (m = 2)<br />

• introducing claim severities (for instance, claims with amount less than E1000, between<br />

E1000 and E10 000, and claims above E10 000, so that m = 3). In this case, we have to<br />

assume that claim severities and claim frequencies are mutually independent.<br />

Here, we will assume that the claims are classified according to a multinomial scheme.<br />

Specifically, each time a claim is reported, it is classified in one <strong>of</strong> the m possible categories,<br />

with probabilities q 1 q m . Let us denote as N i the number <strong>of</strong> claims <strong>of</strong> type i. Then, the<br />

random vector N 1 N m is Multinomially distributed, with probability mass function<br />

PrN 1 = k 1 N m = k m =<br />

{<br />

n!<br />

k 1 !···k m ! qk 1<br />

1 ···qk m m<br />

0 otherwise<br />

if k 1 +···+k m = n<br />

where n is the total number <strong>of</strong> claims.<br />

Each <strong>of</strong> the m components separately has a Binomial distribution with parameters n and<br />

q i , for the appropriate value <strong>of</strong> the subscript i, that is, N i ∼ inn q i . Because <strong>of</strong> the<br />

constraint that the sum <strong>of</strong> the components is n, that is, N 1 +···+N m = n, they are negatively<br />

correlated.<br />

The expected value is EN i = nq i . The covariance matrix is as follows: Each diagonal<br />

entry is the variance <strong>of</strong> a Binomially distributed random variable, and is therefore<br />

VN i = nq i 1 − q i . The <strong>of</strong>f-diagonal entries are the covariances. These are<br />

CN i N j =−nq i q j for i, j distinct. This is a m × m nonnegative-definite matrix <strong>of</strong> rank<br />

m − 1.<br />

We will use the following result.<br />

Property 6.1 Let us assume that the total number <strong>of</strong> claims N is oi distributed.<br />

Assume that the N claims may be classified into m categories, according to a multinomial<br />

partitioning scheme with probabilities q 1 q m . Let N i represent the number <strong>of</strong> claims <strong>of</strong>


272 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

type i i = 1m. Then the random variables N 1 N m are independent and Poisson<br />

distributed with respective parameters q 1 q m .<br />

Pro<strong>of</strong><br />

Since given N = n, N i ∼ inn q i , we can write<br />

PrN i = k =<br />

∑<br />

PrN i = kN = n PrN = n<br />

n=k<br />

( ∑ n<br />

=<br />

)q k i<br />

n=k<br />

k<br />

1 − q i n−k exp− n<br />

n!<br />

= exp− q i k ∑ 1 − q i n<br />

k!<br />

n=0<br />

n!<br />

= exp−q i q i k<br />

k!<br />

which proves that N i ∼ oiq i .<br />

We now prove the independence property <strong>of</strong> the N i s. To this end, let us show that the<br />

joint probability mass function factors in the product <strong>of</strong> marginal probability mass functions:<br />

PrN 1 = n 1 N m = n m <br />

n 1+···+n m<br />

= PrN 1 = n 1 N m = n m N = n 1 +···+n m exp−<br />

n 1 +···+n m !<br />

= n 1 +···+n m !<br />

q n 1<br />

1<br />

n 1 !···n m ! ···qn m<br />

m exp− n 1+···+n m<br />

n 1 +···+n m !<br />

m∏<br />

= exp−q j q j n j<br />

j=1<br />

n j !<br />

m∏<br />

= PrN j = n j <br />

j=1<br />

which completes the pro<strong>of</strong>.<br />

□<br />

Let us now denote as q k1 q k2 q km the probability that the claim is <strong>of</strong> type 1 2m,<br />

respectively, for a policyholder with = k . The identity q k1 +q k2 +···+q km = 1 obviously<br />

holds true. Now, let N 1 N 2 N m be the number <strong>of</strong> claims <strong>of</strong> type 1 2m,<br />

respectively. Considering Property 6.1, given and , the random variables N 1 N 2 N m<br />

are mutually independent, with respective conditional probability mass function<br />

PrN l = j = = k = exp− k q kl − kq kl j<br />

<br />

j!<br />

j = 0 1<br />

for l = 1m.


Multi-Event Systems 273<br />

6.3.2 Markov <strong>Modelling</strong> for the Multi-Event Bonus-Malus Scale<br />

The scale is assumed to have s +1 levels, numbered from 0 to s. A specified level is assigned<br />

to a new driver. Each claim free year is rewarded by a bonus point (i.e. the driver goes one<br />

level down). Each type <strong>of</strong> claim entails a specific penalty, expressed as a fixed number <strong>of</strong><br />

levels per claim.<br />

We assume that the scale possesses the following memoryless property: the knowledge<br />

<strong>of</strong> the present level and <strong>of</strong> the number <strong>of</strong> claims <strong>of</strong> each type filed during the present<br />

year suffices to determine the level to which the policy is transferred. This ensures that the<br />

bonus-malus system may be represented by a Markov chain (at least conditionally on the<br />

observable characteristics and random effects).<br />

Let p l1 l 2<br />

q be the probability <strong>of</strong> moving from level l 1 to level l 2 for a policyholder<br />

with annual mean claim frequency and vector probability q = q 1 q m T ; here q j is the<br />

probability that the claim be <strong>of</strong> type j. Further, P q is the one-step transition matrix, i.e.<br />

⎛<br />

⎞<br />

p 00 q ··· p 0s q<br />

P q =<br />

⎜<br />

<br />

⎝ <br />

⎟<br />

⎠ <br />

p s0 q ··· p ss q<br />

Taking the nth power <strong>of</strong> P q yields the n-step transition matrix whose element l 1 l 2 ,<br />

denoted as p n<br />

l 1 l 2<br />

q, is the probability <strong>of</strong> moving from level l 1 to level l 2 in n transitions.<br />

The transition matrix P q associated with such a bonus-malus system is assumed<br />

to be regular, i.e. there exists some integer 0 ≥ 1 such that all entries <strong>of</strong> ( P q ) 0<br />

are<br />

strictly positive. Consequently, the Markov chain describing the trajectory <strong>of</strong> a policyholder<br />

with expected claim frequency and vector probability q is ergodic and thus possesses a<br />

stationary distribution<br />

q = 0 q 1 q s q T <br />

Here, l q is the stationary probability for a policyholder with mean frequency to be<br />

in level l i.e.<br />

l2<br />

q = lim<br />

n→+ pn l 1 l 2<br />

q<br />

The stationary probabilities are directly obtained from formula (4.9).<br />

Let L q be valued in 0 1sand conform to the distribution q i.e.<br />

PrL q = l = l q<br />

l = 0 1s<br />

The variable L q thus represents the level occupied by a policyholder with annual expected<br />

claim frequency and probability vector q once the steady state has been reached.<br />

Now, let L be the level occupied in the scale by a randomly selected policyholder once<br />

the steady state has been reached. The distribution <strong>of</strong> L can be written as<br />

PrL = l = ∑ k<br />

∫ +<br />

w k l k q k dF l = 0 1s (6.8)<br />

0


274 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

6.3.3 Determination <strong>of</strong> the relativities<br />

The relativity associated with level l is denoted as r l , as before. The meaning is that an<br />

insured occupying that level pays an amount <strong>of</strong> premium equal to r l % <strong>of</strong> the reference<br />

premium determined on the basis <strong>of</strong> his observable characteristics.<br />

As in Chapter 4, our aim is to minimize the expected squared difference between the<br />

‘true’ relative premium and the relative premium r L applicable to this policyholder (after<br />

the stationary state has been reached), i.e. the goal is to minimize<br />

The solution is given by<br />

]<br />

E<br />

[ − r L 2 =<br />

s∑<br />

l=0<br />

= ∑ k<br />

∣ ]<br />

E<br />

[ − r l 2 ∣∣L = l PrL = l<br />

w k<br />

∫ +<br />

0<br />

s∑<br />

− r l 2 l k q k dF <br />

l=0<br />

∫ + ∑k w k <br />

0 l k q k dF <br />

r l = EL = l = ∫ + ∑k w k <br />

0 l k q k dF (6.9)<br />

It is easily seen that Er L = 1, resulting in financial equilibrium once steady state is reached.<br />

6.3.4 Numerical Illustrations<br />

− 1/ + 2/ + 3 Bonus-Malus Scale<br />

Let us now consider the scale with six levels (numbered from 0 to 5) already used in the<br />

previous chapters. But now, instead <strong>of</strong> considering one type <strong>of</strong> claim, we penalize differently<br />

claims with bodily injuries and claims with material damage only. If no claims have been<br />

reported then the policyholder moves one level down. <strong>Claim</strong>s with material damage only<br />

are penalized by two levels whereas claims with bodily injuries entail a penalty <strong>of</strong> three<br />

levels. If n 1 claims with bodily injuries and n 2 claims with material damage only are reported<br />

during the year then the policyholder moves 3n 1 + 2n 2 levels up. This system is abbreviated<br />

as −1/ + 2/ + 3, in obvious notations.<br />

The transition matrix for a policyholder with annual mean claim frequency and vector<br />

probability q = q 1 q 2 T is given by<br />

⎛<br />

⎞<br />

P 0 0 P 1 P 2 P 3 1 − 1<br />

P 0 0 0 P 1 P 2 1 − 2<br />

P q =<br />

0 P 0 0 0 P 1 1 − 3<br />

⎜ 0 0 P 0 0 0 1− 4<br />

⎟<br />

⎝ 0 0 0 P 0 0 1− 4<br />

⎠<br />

0 0 0 0 P 0 1 − 4<br />

where<br />

P 0 = exp−<br />

P 1 = q 2 exp−q 2 exp−q 1


Multi-Event Systems 275<br />

P 2 = exp−q 2 q 1 exp−q 1 <br />

P 3 = q 2 2<br />

exp−q<br />

2<br />

2 exp−q 1 <br />

and i represents the sum <strong>of</strong> all the elements in row i. Specifically,<br />

1 = P 0 + P 1 + P 2 + P 3<br />

2 = P 0 + P 1 + P 2<br />

3 = P 0 + P 1<br />

4 = P 0 <br />

Computation <strong>of</strong> the Relativities<br />

Portfolio A Table 6.5 gives for each <strong>of</strong> the six levels <strong>of</strong> the bonus-malus scale the proportion<br />

<strong>of</strong> the portfolio in that level (column 2) and the relativity attached to that level (column 3)<br />

for the system −1/ + 2/ + 3. About 70 % <strong>of</strong> the portfolio is in level 0 and enjoys a discount<br />

<strong>of</strong> about 30 %. The rest <strong>of</strong> the portfolio is spread out among levels 1–5. The r l s range from<br />

68.0 % to 262.6 %.<br />

In order to compare the results with those <strong>of</strong> traditional bonus-malus scales, we have<br />

also recalled the results given by the three other scales already considered in the previous<br />

chapters. For all <strong>of</strong> them, each claim-free year is rewarded by one level down in the scale.<br />

The first bonus-malus system penalizes each claim (with or without bodily injuries) by<br />

two levels up in the scale (−1/ + 2 bonus-malus scale), the second one by three levels up<br />

(−1/ + 3 bonus-malus scale) and the third one sends the policyholder to the maximal level<br />

after one claim (−1/top bonus-malus scale).<br />

Clearly, the more the claims are penalized, the more the policyholders occupying the<br />

lowest levels are awarded discounts, and the less the policyholders in the upper part <strong>of</strong> the<br />

scale are penalized. The scale −1/ + 2/ + 3 is closer to the scale −1/ + 2. This is because<br />

the majority <strong>of</strong> the claims only induce material damage. Nevertheless, r 5 is reduced from<br />

271.4 % to 262.6 % when the claims with bodily injuries are more severely penalized.<br />

Table 6.5 Results for the bonus-malus systems −1/ + 2/ + 3, −1/ + 2, −1/ + 3 and −1/top for<br />

Portfolio A.<br />

Level l −1/ + 2/ + 3 −1/ + 2 −1/ + 3 −1/top<br />

l r l l r l l r l l r l<br />

5 47 % 2626% 44 % 2714% 73 % 2308% 128 % 1812%<br />

4 47 % 2158% 47 % 2185% 59 % 2009% 97 % 1599%<br />

3 49 % 1821% 44 % 1925% 90 % 1452% 77 % 1439%<br />

2 86 % 1380% 87 % 1388% 73 % 1331% 62 % 1313%<br />

1 71 % 1278% 71 % 1286% 60 % 1230% 52 % 1209%<br />

0 700% 680% 706% 685% 645% 642% 585% 612%


276 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 6.6 Results for the bonus-malus systems −1/ + 2/ + 3, −1/ + 2, −1/ + 3 and −1/top for<br />

Portfolio B.<br />

Level l −1/ + 2/ + 3 −1/ + 2 −1/ + 3 −1/top<br />

l r l l r l l r l l r l<br />

5 59 % 2159% 54 % 2232% 92 % 1840% 163 % 1469%<br />

4 62 % 1695% 60 % 1713% 78 % 1566% 125 % 1293%<br />

3 64 % 1434% 58 % 1489% 117 % 1174% 99 % 1172%<br />

2 112 % 1118% 115 % 1121% 95 % 1086% 80 % 1080%<br />

1 92 % 1046% 93 % 1050% 78 % 1016% 66 % 1007%<br />

0 611% 745% 620% 747% 540% 720% 466% 706%<br />

Portfolio B Table 6.6 gives for each <strong>of</strong> the six levels <strong>of</strong> the bonus-malus scale the proportion<br />

<strong>of</strong> the portfolio in that level (column 2) and the relativity attached to that level (column 3)<br />

for the system −1/ + 2/ + 3. About 60 % <strong>of</strong> the portfolio is in level 0 and enjoys a discount<br />

<strong>of</strong> about 25 %. The rest <strong>of</strong> the portfolio is spread out among levels 1–5. The r l s range from<br />

74.5 % to 215.9 %.<br />

In order to compare the results with those <strong>of</strong> traditional bonus-malus scales, we have also<br />

recalled the results given by the three bonus-malus scales −1/ + 2, −1/ + 3 and −1/top<br />

already considered in the previous chapters. The scale −1/ + 2/ + 3 is close to the scale<br />

−1/ + 2 (again, this is because the majority <strong>of</strong> the claims only induce material damage).<br />

Nevertheless, r 5 is reduced from 223.2 % to 215.9 % when the claims with bodily injuries<br />

are more severely penalized.<br />

6.4 Further Reading and Bibliographic Notes<br />

The second part <strong>of</strong> this chapter is based on Pitrebois, Denuit & Walhin (2006a).<br />

Lemaire (1995, Chapter 13) applied a model proposed by Picard (1976) to Belgian data,<br />

distinguishing the accidents that caused property damage only, from those that caused bodily<br />

injuries. The credibility model proposed by Lemaire (1995) is based on a Poisson-Gamma<br />

mixture, and assumes that given the expected annual claim frequency <strong>of</strong> the policyholder,<br />

the frequency <strong>of</strong> claims with bodily injuries conforms to a Beta distribution. This approach<br />

can be extended to several categories <strong>of</strong> claims using a Dirichlet distribution (that is, using<br />

a suitable multivariate Beta distribution).<br />

With the aid <strong>of</strong> multi-equation Poisson models with random effects, Pinquet (1998)<br />

designed an optimal credibility model for different types <strong>of</strong> claims. As an example, claims<br />

are separated into two groups according to fault with respect to a third party. See also<br />

Pinquet (1997) on allowing for the costs <strong>of</strong> the claims.<br />

This chapter gives only basic methods to deal with different types <strong>of</strong> claims in a posteriori<br />

ratemaking. Advanced statistical and econometrics models could certainly improve the<br />

actuarial analysis. For instance, Wedel, Böckenholtb & Kamakurac (2003) developed a<br />

general class <strong>of</strong> factor-analytic models for the analysis <strong>of</strong> multivariate (truncated) count data.<br />

These models provide a parsimonious and easy-to-interpret representation <strong>of</strong> multivariate<br />

dependencies in counts that extend the general linear latent variable model.


7<br />

Bonus-Malus Systems with<br />

Varying Deductibles<br />

7.1 Introduction<br />

In this chapter, we compare bonus-malus systems to deductibles. Specifically, we design a<br />

system in which the policyholders in the malus zone are allowed to choose at each renewal<br />

between a premium surcharge (induced by relativities associated with the bonus-malus scale)<br />

or a deductible in case <strong>of</strong> a claim during the forthcoming year. If the deductible is selected,<br />

this induces a strong incentive to careful driving. According to signal theory, drivers opting<br />

for the deductible are expected to be better drivers (on average) than those paying the<br />

premium surcharge induced by the upward mode in the bonus-malus scale.<br />

Bonus-malus systems do not take claim amounts into account so that a posteriori<br />

corrections only rely on the number <strong>of</strong> claims. Holtan (1994) suggested the use <strong>of</strong> very high<br />

deductibles that may be borrowed by the policyholder from the insurance company. Although<br />

technically acceptable, this approach obviously causes considerable practical problems. While<br />

Holtan (1994) assumes a high deductible which is constant for all policyholders, and thus<br />

independent <strong>of</strong> the level they occupy in the bonus-malus scale at the claim occurrence time,<br />

the present chapter lets the deductible vary between the levels <strong>of</strong> the bonus-malus scales, and<br />

also considers a mixed case setup <strong>of</strong> both premium and deductible surcharge after a claim.<br />

Specifically, the a posteriori premium correction induced by the bonus-malus scale is<br />

replaced with a deductible (in whole or in part). To each level <strong>of</strong> the bonus-malus scale is<br />

attached an amount <strong>of</strong> deductible, applied to the claims filed during the coverage period.<br />

Combining bonus-malus scales with varying deductibles presents a number <strong>of</strong> advantages:<br />

(1) according to signal theory, policyholders choosing varying deductible should be good<br />

drivers;<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


278 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

(2) even if the policyholder leaves the company after a claim, he has to pay for the deductible<br />

(but in motor third party liability insurance, there may be difficulties linked to the<br />

collection <strong>of</strong> the deductible);<br />

(3) in the mixed system, the r l s and the severity <strong>of</strong> the deductibles may be tuned in an<br />

optimal way in order to attract the policyholders.<br />

The numerical study conducted in this chapter shows, provided appropriate values for the<br />

parameters are selected, that the amounts <strong>of</strong> deductible are moderate in the mixed case<br />

(reduced relativities combined with deductibles per claim).<br />

Although deductibles can be difficult to implement in motor third party liability insurance<br />

(companies indemnify the third parties directly and so cannot simply reduce the amount they<br />

pay), the system is nevertheless easy to implement for first party coverages (for which the<br />

payments are made directly to the policyholders), like material damage for instance. In the<br />

European Union, the premium for the material damage cover is either subject to the bonusmalus<br />

system applying to motor third party liability or to a specific bonus-malus system.<br />

In the latter case, using the techniques suggested in this chapter, one could replace the<br />

bonus-malus scale for material damage with deductibles determined by past claim history.<br />

This chapter will focus on optional coverages.<br />

7.2 Distribution <strong>of</strong> the Annual Aggregate <strong>Claim</strong>s<br />

7.2.1 <strong>Modelling</strong> <strong>Claim</strong> Costs<br />

As explained in the introduction, the strategy designed in this chapter is difficult to apply in<br />

motor third party liability insurance. To fix the ideas, we consider here first party coverages<br />

for material damages, so that the actuary is not faced with possible large losses. The total<br />

claim cost can thus be modelled using a compound mixed Poisson model. Specifically, let<br />

us denote as C 1 C 2 C N the amounts <strong>of</strong> the N claims reported by the policyholder. The<br />

total claim amount for this policy is<br />

S =<br />

N∑<br />

C k (7.1)<br />

k=1<br />

with the convention that the empty sum equals 0. The severities C 1 C 2 , are assumed<br />

to be independent and identically distributed, and independent <strong>of</strong> the claim frequency N .<br />

As in the preceding chapters, N is taken to be oi distributed. This construction<br />

essentially states that the cost <strong>of</strong> an accident is for the most part beyond the control <strong>of</strong><br />

a policyholder. The degree <strong>of</strong> care exercised by a driver mostly influences the number <strong>of</strong><br />

accidents, but in a much lesser way the cost <strong>of</strong> these accidents. Nevertheless, this assumption<br />

seems acceptable in practice, at least as an approximation. Note that the severities C 1 C 2 <br />

are also independent <strong>of</strong> .<br />

Considering Expression (7.1) for the total cost <strong>of</strong> claims S, the pure premium for a<br />

policyholder without claim history is EC 1 . This amount will be corrected by a specific<br />

relativity according to the level occupied in the bonus-malus scale, that is, according to the<br />

claims reported to the company in the past.


Bonus-Malus Systems with Varying Deductibles 279<br />

The distribution function <strong>of</strong> S is then given by<br />

PrS ≤ x = = =<br />

where F is the common distribution function <strong>of</strong> the C k s and<br />

∑<br />

n=0<br />

exp− n F ⋆n x x ≥ 0 (7.2)<br />

n!<br />

F ⋆n x = PrC 1 +···+C n ≤ x<br />

is the n-fold convolution <strong>of</strong> F. Computing F ⋆n x amounts to performing an n-dimensional<br />

integration <strong>of</strong> the probability density function corresponding to F. Together with the sum<br />

over n in (7.2), this <strong>of</strong> course makes the formula to get PrS ≤ x = = very time<br />

consuming. Furthermore, we still have to integrate over the possible values <strong>of</strong> and to<br />

get PrS ≤ x, that is,<br />

PrS ≤ x = ∑ k<br />

∫ +<br />

w k PrS ≤ x = k = dF x ≥ 0<br />

0<br />

Fortunately, the Poisson distribution belongs to the Panjer’s class <strong>of</strong> counting distributions<br />

for which there exists a recursive algorithm.<br />

7.2.2 Discretization<br />

The computation <strong>of</strong> the distribution function <strong>of</strong> S with the help <strong>of</strong> the Panjer algorithm<br />

requires the discretization <strong>of</strong> the individual claim amounts C k . This amounts to replacing<br />

each C k with a discrete claim cost Ck<br />

multiple <strong>of</strong> some fixed monetary unit . Usually,<br />

actuaries use one <strong>of</strong> the following two methods to discretize the claim amounts:<br />

Rounding up<br />

In this case, a step is selected and the discretized distribution function F is defined as<br />

F x = Fk for k ≤ xEC k).


280 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Mass Dispersion<br />

It is also possible to discretize the C k s keeping the expected claim cost unchanged. This is<br />

done by spreading the probability mass <strong>of</strong> each i i + 1 on the two extremities <strong>of</strong> the<br />

interval (instead <strong>of</strong> placing all the mass on i + 1 as before). Define for i = 0 1 2,<br />

f +<br />

i<br />

Using integration by parts, f +<br />

i<br />

f +<br />

i<br />

= 1 <br />

∫ i+1<br />

x=i<br />

can be cast into<br />

(<br />

i + 1 − x<br />

)<br />

dFx<br />

= 1 ( [(i ) ] i+1<br />

+ 1 − x Fx +<br />

<br />

x=i<br />

= 1 ( ∫ i+1<br />

)<br />

−Fi + Fxdx<br />

<br />

x=i<br />

= 1 <br />

∫ i+1<br />

x=i<br />

Similarly, for i = 0 1 2, define<br />

f −<br />

i+1 = 1 <br />

= 1 <br />

(<br />

Fx − Fi<br />

)<br />

dx<br />

∫ i+1<br />

x=i<br />

∫ i+1<br />

x=i<br />

(<br />

x − i<br />

)<br />

dFx<br />

∫ i+1<br />

x=i<br />

(<br />

Fi + 1 − Fx<br />

)<br />

dx<br />

)<br />

Fxdx<br />

Let us now spread the probability mass Fi + 1 − Fi <strong>of</strong> the interval i i + 1 on<br />

its extremities i (which receives f +<br />

i ) and i + 1 (which receives fi+1 − ). Then,<br />

PrC 1 = 0 = f 0 = F0 + f + 0<br />

PrC 1 = i = f i = f −<br />

i<br />

+ f +<br />

i<br />

for i ≥ 1 (7.4)<br />

When the discretization is performed according to (7.4), C1<br />

<br />

It is easy to see that for k ≤ x


Bonus-Malus Systems with Varying Deductibles 281<br />

7.2.3 Panjer Algorithm<br />

Direct Convolution Approach<br />

The Panjer algorithm then allows us to compute the distribution function <strong>of</strong> the discretized<br />

version <strong>of</strong> S, defined as<br />

S =<br />

N∑<br />

C k <br />

k=1<br />

To realize the merit <strong>of</strong> the Panjer approach, let us first write the formulas in the direct<br />

approach. Let us denote as f i = PrC1<br />

= i the probability mass function <strong>of</strong> C 1 . The<br />

probability mass function <strong>of</strong> C1 +···+C k<br />

is<br />

[ ]<br />

k∑<br />

f ⋆k<br />

i = Pr C j<br />

= i <br />

j=1<br />

In the applications we have in mind, the distribution function for the claim costs satisfies<br />

F0 = 0. Clearly, with Discretization Method (7.3), we then have f 0 = 0 so that f ⋆k<br />

i = 0if<br />

i ≤ k − 1 (since C1 ⋆k<br />

≥ with probability 1). The fi<br />

s satisfy<br />

f ⋆k<br />

i−k+1<br />

∑<br />

i = f ⋆k−1<br />

i−j f j if i ≥ k (7.5)<br />

Then, the probability mass function g i = PrS = i <strong>of</strong> S satisfies<br />

g i =<br />

i∑<br />

k=0<br />

j=1<br />

PrN = kf ⋆k<br />

i i∈ (7.6)<br />

This direct computation <strong>of</strong> the probability mass function <strong>of</strong> S requires a lot <strong>of</strong> computation<br />

time. Things are even worse in the Discretization (7.4) for which f 0 > 0.<br />

Panjer Family<br />

The Panjer formula holds for a class <strong>of</strong> probability distributions referred to as the Katz<br />

family in statistical circles, and as the Panjer family in the actuarial literature. This family<br />

contains all the counting distributions such that the relation<br />

(<br />

p k = a + b )<br />

p<br />

k k−1 k= 1 2 (7.7)<br />

is fulfilled for some a and b. The Panjer family contains three elements. Specifically, the<br />

probability distributions satisfying (7.7) are<br />

(i) the Poisson distribution, obtained with a = 0 and b>0;<br />

(ii) the Negative Binomial distribution, obtained with 0


282 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Panjer Algorithm with f 0 = 0<br />

The Panjer algorithm is easily established with probability generating functions. Here, we<br />

use another approach with conditional expectations. The pro<strong>of</strong> <strong>of</strong> the Panjer formula is based<br />

on the following technical lemma.<br />

Lemma 7.1<br />

The relation<br />

f ⋆n<br />

j<br />

= n j<br />

j∑<br />

i=1<br />

if i f ⋆n−1<br />

j−i <br />

is valid for j ≥ n.<br />

Pro<strong>of</strong><br />

The pro<strong>of</strong> consists <strong>of</strong> noting that, on the one hand, for any j ≥ n<br />

[ ∣ ] [ ∣ ]<br />

∣∣∣∣ n∑<br />

E X 1 X k = j = 1 n∑ ∣∣∣∣ n∑<br />

k=1<br />

n E X k X k = j = j<br />

k=1 k=1<br />

n <br />

and on the other hand, if f ⋆k<br />

j > 0,<br />

[ ∣ ] [<br />

]<br />

∣∣∣∣ n∑<br />

j∑<br />

n∑<br />

E X 1 X k = j = i Pr X 1 = i<br />

X<br />

∣ k = j<br />

k=1<br />

i=1<br />

k=1<br />

∑ j<br />

i=1<br />

=<br />

i Pr X 1 = i and ∑ n<br />

k=2 X k = j − i<br />

Pr ∑ n<br />

k=1 X k = j<br />

∑ j<br />

i=1<br />

=<br />

if if ⋆n−1<br />

j−i<br />

<br />

f ⋆n<br />

j<br />

Equating these two formulas yields the expected result.<br />

□<br />

We are now ready to derive the Panjer algorithm if f 0 = 0 (as is the case with Discretization<br />

Method (7.3)).<br />

Proposition 7.1 If the probability distribution <strong>of</strong> N belongs to the Panjer family and if<br />

f 0 = 0, then the g i s are obtained recursively from<br />

starting with g 0 = p 0 .<br />

g j =<br />

j∑<br />

i=1<br />

(<br />

a + i b )<br />

f<br />

j i g j−i (7.8)<br />

Pro<strong>of</strong><br />

Since C k ≥ almost surely, we clearly have<br />

PrS = 0 = PrN = 0 = p 0


Bonus-Malus Systems with Varying Deductibles 283<br />

For j ≥ 1, Lemma 7.1 allows us to write<br />

g j =<br />

=<br />

=<br />

=<br />

+∑<br />

(<br />

a + b )<br />

p<br />

n n−1 f ⋆n<br />

j<br />

n=1<br />

+∑<br />

j∑<br />

ap n−1<br />

n=1 i=1<br />

j∑<br />

i=1<br />

j∑<br />

i=1<br />

(<br />

a + i b j<br />

+∑<br />

j−i +<br />

n=1<br />

f i f ⋆n−1<br />

)<br />

∑+ f i<br />

n=1<br />

(<br />

a + i b )<br />

f<br />

j i g j−i <br />

p n−1 f ⋆n−1<br />

j−i<br />

b<br />

j∑<br />

j p n−1<br />

i=1<br />

if i f ⋆n−1<br />

j−i<br />

which ends the pro<strong>of</strong>.<br />

□<br />

Corollary 7.1 (Compound Poisson Case)<br />

If N ∼ oi (a = 0 and b = ), we get<br />

{ exp− if i = 0<br />

g i =<br />

<br />

∑ i<br />

i j=1 jf jg i−j if i ≥ 1<br />

Panjer Algorithm with f 0 > 0<br />

Let us now consider that f 0 > 0 (resulting from Discretization Method (7.4)).<br />

Proposition 7.2 If the probability distribution <strong>of</strong> N belongs to the Panjer family and if<br />

f 0 ≠ 0, then the g i s are obtained recursively from<br />

g j =<br />

starting from g 0 = N f 0 .<br />

Pro<strong>of</strong> Since f ⋆k<br />

0 = f 0 k , k ∈ , we easily see that<br />

(<br />

1<br />

j∑<br />

a + i b )<br />

f<br />

1 − af 0 j i g j−i (7.9)<br />

g 0 =<br />

+∑<br />

k=0<br />

i=1<br />

p k f 0 k = N f 0 <br />

Following the reasoning in the pro<strong>of</strong> <strong>of</strong> Proposition 7.1, we get for j>0<br />

g j =<br />

+∑<br />

ap n−1<br />

n=1 i=0<br />

j∑<br />

f i f ⋆n−1<br />

∑+ = af 0 p n−1 f ⋆n−1<br />

n=1<br />

= af 0 g j +<br />

j∑<br />

i=1<br />

+∑<br />

j−i +<br />

n=1<br />

j∑<br />

j +<br />

i=1<br />

(<br />

a + i b j<br />

b<br />

j∑<br />

j p n−1<br />

i=1<br />

(<br />

a + i b )<br />

∑+ f<br />

j i<br />

)<br />

f i g j−i <br />

if i f ⋆n−1<br />

j−i<br />

n=1<br />

p n−1 f ⋆n−1<br />

j−i<br />

which ends the pro<strong>of</strong>.<br />


284 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Computation <strong>of</strong> the Distribution Function <strong>of</strong> S<br />

Clearly,<br />

i∑<br />

PrS > i = 1 − g j <br />

j=0<br />

i∈ where g j = PrS = j<br />

The recursive formula<br />

{ 1 if i =−1<br />

PrS > i =<br />

PrS >i− 1 − g i if<br />

i ∈ <br />

gives the distribution <strong>of</strong> S.<br />

7.3 Introducing a Deductible within a Posteriori Ratemaking<br />

Often, the r l s for high levels l are so large that the system has to be s<strong>of</strong>tened before a<br />

possible commercial implementation, resulting in financial instability (since the company<br />

then faces a progressive decrease <strong>of</strong> the average premium level because <strong>of</strong> a clustering<br />

<strong>of</strong> the policyholders in the high-discount classes). To avoid this deficiency, the premium<br />

increase that the policyholder has to pay when he goes up in the scale could be (at least<br />

partly) replaced by a deductible that would be applied on claims filed by the policyholder<br />

during the following year. The company compensates for the reduced penalties in the<br />

malus zone with the deductibles paid by policyholders who report claims whilst in the<br />

malus zone. This can be commercially attractive since the policyholders are penalized<br />

only if they file claims in the future. The amount <strong>of</strong> these deductibles depends on<br />

the level attained by the policyholder and can be applied either annually or claim by<br />

claim.<br />

7.3.1 Annual Deductible<br />

Let us consider a policyholder occupying level l in the scale. If this policyholder is subject<br />

to the a posteriori premium corrections induced by the bonus-malus scale, he will have to<br />

pay r l EC 1 to be covered by the company. If the policyholder is subject to the annual<br />

deductible instead, he will have to pay the pure premium EC 1 as well as minS d l . The<br />

indifference principle is now expressed for level l by the equation<br />

r l EC 1 = EC 1 + ESS


Bonus-Malus Systems with Varying Deductibles 285<br />

Equation (7.10) is to be solved for all levels l such that r l > 100 % (that is, for all levels<br />

in the malus zone). So, the premium surcharge r l − 1EC 1 is replaced with an annual<br />

deductible d l . The relation (7.10) ensures that the substitution is actuarially fair.<br />

In practice, equation (7.10) does not possess an explicit solution so that numerical<br />

techniques have to be used. Panjer’s algorithm is employed to derive the distribution <strong>of</strong><br />

S. The claim amounts are discretized according to Method (7.4) and the probability mass<br />

is concentrated on 500 points. Then, the equation can be solved using appropriate routines<br />

available in the IML package <strong>of</strong> SAS R .<br />

7.3.2 Per <strong>Claim</strong> Deductible<br />

Of course, the deductible could also be applied to each claim filed by the policyholder.<br />

The indifference principle invoked above will again be used to determine the amount <strong>of</strong><br />

the deductible. Considering a policyholder in level l, he will have to pay r l EC 1 if he is<br />

subject to the a posteriori corrections induced by the bonus-malus scale. If, on the contrary,<br />

a fixed deductible d l is applied per claim, he will have to pay EC 1 as well as minC k d l <br />

for each <strong>of</strong> the claims C k reported to the company. Note that the expected number <strong>of</strong> claims<br />

is now r l because past claims history is used to update the claim frequency distribution.<br />

According to the indifference principle, the amount <strong>of</strong> deductible d l for a policyholder in<br />

level l is the solution to the equation<br />

)<br />

r l EC 1 = EC 1 + r l<br />

(EC 1 C 1


286 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Let us now give the equations providing the d l s. In the case <strong>of</strong> an annual deductible, the<br />

indifference principle allows us to write<br />

r l EC 1 = 1 − r l EC 1 + EminS d l <br />

= 1 − r l EC 1 + ESS


Bonus-Malus Systems with Varying Deductibles 287<br />

monetary units. The distribution <strong>of</strong> S is then determined by Panjer’s algorithm. <strong>Claim</strong><br />

amounts are discretized according to Method (7.4).<br />

7.4.3 Annual Deductible<br />

The relativities computed for the scale −1/top are displayed in the second column <strong>of</strong> Table 7.1<br />

and for the scale −1/ + 2 in Table 7.2. In the pure bonus-malus case, it is thus clear that the<br />

r l s associated with the upper levels are considerable (more than 300 % for level 5 in case <strong>of</strong><br />

the scale −1/ + 2). Now, let us replace the r l s in the malus zone (i.e. levels 1 to 5) with an<br />

annual deductible d l . In order to obtain the d l s from equation (7.10), we first discretize the<br />

claim sizes using Method (7.4). Finally, (7.10) is solved numerically using routines available<br />

from the SAS R /IML package.<br />

The third column <strong>of</strong> each table displays the new relativities. In this case, the maluses<br />

disappear (r l = 100 % for l = 15) and are compensated for by the deductibles listed<br />

in the two last columns. The fourth column shows the deductible to be applied if the loss<br />

amounts are Negative Exponentially distributed and the last column shows the deductible to<br />

be applied if the loss amounts are LogNormally distributed. Since the LogNormal distribution<br />

has a thicker tail than the Negative Exponential one, we expect larger amounts <strong>of</strong> deductible<br />

for the former. This is indeed the case, as can be seen from Tables 7.1 and 7.2.<br />

The very high r l s in the second column induce high amounts <strong>of</strong> deductible, even in the<br />

Negative Exponential case. Therefore, this solution seems to be difficult (if not impossible)<br />

to implement in practice.<br />

Table 7.1 Results for an annual deductible varying according to the level occupied in the malus zone<br />

and for the scale −1/top.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 1973 % 100 % 18 367 35 253<br />

4 1709 % 100 % 14 028 29 303<br />

3 1507 % 100 % 10 472 23 544<br />

2 1348 % 100 % 7470 17 934<br />

1 1220 % 100 % 4897 12 452<br />

0 547 % 54.7 % 0 0<br />

Table 7.2 Results for an annual deductible varying according to the level occupied in the malus zone<br />

and for the scale −1/ + 2.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 3091 % 100 % 34 576 50 794<br />

4 2414 % 100 % 25 076 42 704<br />

3 2077 % 100 % 20 004 37 241<br />

2 1429 % 100 % 9027 20 933<br />

1 1302 % 100 % 6563 16 079<br />

0 624 % 62.4 % 0 0


288 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

7.4.4 Per <strong>Claim</strong> Deductible<br />

In this case, there is an analytical solution when the claims are Negative Exponentially<br />

distributed: the deductible d l involved in (7.11) is simply given by ln r l times the<br />

expected claim cost. No explicit solution is available when claim amounts are LogNormally<br />

distributed, and numerical procedures have to be used in this case to find the<br />

deductibles d l .<br />

Table 7.3 displays the results obtained for a deductible per claim for the scale −1/top<br />

and Table 7.4 for the scale −1/ + 2, when the premium paid by the policyholder is<br />

held constant whatever the claim history. As was the case for the annual deductible,<br />

the second column gives the relativities associated with each level <strong>of</strong> the scale in the<br />

case <strong>of</strong> a classical bonus-malus system. The third column gives the relative premium<br />

in the case <strong>of</strong> a scale with the deductible system. The fourth column shows the<br />

deductible to be applied if the loss amounts are Negative Exponentially distributed and<br />

the last column shows the deductible to be applied if the loss amounts are LogNormally<br />

distributed.<br />

Again, the amounts <strong>of</strong> deductible displayed in the last two columns are very high compared<br />

to the annual premium, especially for the scale −1/ + 2. This results from the severe r l s<br />

listed in column 2. In order to get acceptable amounts <strong>of</strong> deductible, keeping the financial<br />

stability <strong>of</strong> the system, in the next section we will combine s<strong>of</strong>tened penalties in the malus<br />

zone with moderate deductibles.<br />

Table 7.3 Results for a deductible per claim varying according to the level occupied in the malus<br />

zone for the scale −1/top.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 1973 % 100 % 14 041 16 254<br />

4 1709 % 100 % 11 073 12 191<br />

3 1507 % 100 % 8474 8941<br />

2 1348 % 100 % 6170 6288<br />

1 1220 % 100 % 4108 4080<br />

0 547 % 54.7 % 0 0<br />

Table 7.4 Results for a deductible per claim varying according to the level occupied in the malus<br />

zone for the scale −1/ + 2.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 3091 % 100 % 23 317 31 592<br />

4 2414 % 100 % 18 209 22 633<br />

3 2077 % 100 % 15 101 17 803<br />

2 1429 % 100 % 7376 7651<br />

1 1302 % 100 % 5451 5502<br />

0 624 % 62.4 % 0 0


Bonus-Malus Systems with Varying Deductibles 289<br />

7.4.5 Annual Deductible in the Mixed Case<br />

Another solution would be to keep the system <strong>of</strong> maluses but to reduce the penalties r l<br />

applied to the policyholders in the malus zone, for example by choosing = 20 %. In<br />

exchange for these reduced r l s, the policyholders are subject to an annual deductible on the<br />

claims they will eventually file. Equation (7.12) used to compute the annual deductible then<br />

becomes<br />

r l EC 1 = 80 %r l EC 1 + ESS 100 %.<br />

Tables 7.5 and 7.6 display the numerical results. The r l s displayed in column 3 are equal<br />

to 80 % <strong>of</strong> those in column 2, except for the bonus level 0. Note that the reduced level 1 for<br />

the scale −1/top enters the bonus zone. The annual deductibles in the Negative Exponential<br />

case (column 4) are now reasonable (about twice the annual premium for most levels) but<br />

those in the LogNormal case remain considerable (up to five times the pure premium).<br />

7.4.6 Per <strong>Claim</strong> Deductible in the Mixed Case<br />

If we combine s<strong>of</strong>tened penalties for the bonus-malus system with deductibles per claim,<br />

Equation (7.13) becomes<br />

20 %EC 1 = EC 1 C 1


290 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 7.7 Results for a deductible per claim varying according to the level occupied in the malus<br />

zone, combined with reduced relativities r l , for the scale −1/top.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 197.3 % 157.8 % 4610 4604<br />

4 170.9 % 136.7 % 4610 4604<br />

3 150.7 % 120.6 % 4610 4604<br />

2 134.8 % 107.8 % 4610 4604<br />

1 122.0 % 97.6 % 4610 4604<br />

0 54.7 % 54.7 % 0 0<br />

Table 7.8 Results for a deductible per claim varying according to the level occupied in the malus<br />

zone, combined with reduced relativities r l , for the scale −1/ + 2.<br />

Level l r l r l with deductible d l if C 1 ∼ xp d l if C 1 ∼ or<br />

5 309.1 % 247.3 % 4610 4604<br />

4 241.4 % 193.1 % 4610 4604<br />

3 207.7 % 166.2 % 4610 4604<br />

2 142.9 % 114.3 % 4610 4604<br />

1 130.2 % 104.2 % 4610 4604<br />

0 62.4 % 62.4 % 0 0<br />

As already mentioned, since this equation does not depend on the level l, the amount <strong>of</strong><br />

deductible will be the same for each level <strong>of</strong> the scale.<br />

Tables 7.7 and 7.8 display the numerical results. The third column gathers the relativities:<br />

those in the malus zone have been reduced by 20 % compared to column 2. The last two<br />

columns display the amounts <strong>of</strong> deductible. In this case, the amounts <strong>of</strong> deductible are<br />

reasonable and can be implemented in practice (about 150 % <strong>of</strong> the annual pure premium).<br />

Quite surprisingly, the LogNormal distribution now produces smaller deductibles than its<br />

Negative Exponential counterpart.<br />

7.5 Further Reading and Bibliographic Notes<br />

This chapter is based on Pitrebois, Denuit & Walhin (2005) for the most part.<br />

The Panjer family <strong>of</strong> counting distributions is known as the Katz family in applied<br />

probability. This family attracted a lot <strong>of</strong> attention in the literature, due to the fact that it<br />

contains underdispersed (Binomial), equidispersed (Poisson), and overdispersed (Negative<br />

Binomial) distributions. As a result <strong>of</strong> Panjer’s (1981) publication, a lot <strong>of</strong> other articles<br />

have appeared in the actuarial literature covering similar recursion relations. Multivariate<br />

versions <strong>of</strong> the Panjer algorithm will be used in Chapter 9.<br />

<strong>Claim</strong> amounts have also been taken into account by Bonsdorff (2005), who studied<br />

bonus-malus systems where the transitions between bonus levels in the entire interval a b


Bonus-Malus Systems with Varying Deductibles 291<br />

are determined by the number <strong>of</strong> claims <strong>of</strong> the previous year and the total amount <strong>of</strong> claims<br />

<strong>of</strong> the previous year.<br />

Vandebroek (1993) analysed the efficiency <strong>of</strong> bonus-malus systems and partial coverages<br />

in preventing moral hazard problems, by means <strong>of</strong> stochastic dynamic programming. See<br />

also Holtan (1994) and Lemaire & Zi (1994a) for the trade-<strong>of</strong>f between bonus-malus<br />

systems and deductibles.


8<br />

Transient Maximum Accuracy<br />

Criterion<br />

8.1 Introduction<br />

8.1.1 From Stationary to Transient Distributions<br />

All developments so far have been based on the stationary distribution <strong>of</strong> the Markov process<br />

describing the trajectory <strong>of</strong> the policyholder in the bonus-malus scale. As Borgan, Hoem<br />

& Norberg (1981) objected, an asymptotic criterion is moderately relevant for bonusmalus<br />

systems needing relatively long periods to reach their steady state, since policies<br />

are in force only during a limited number <strong>of</strong> insurance periods. These authors modified<br />

the criterion in order to take into account the rating error for new and young policies.<br />

As Norberg (1976), Borgan ET AL. (1981) measured the performances <strong>of</strong> a bonus-malus<br />

system by a weighted average <strong>of</strong> the expected squared rating errors for selected insurance<br />

periods.<br />

In Chapter 4, the relativities associated with the levels <strong>of</strong> the bonus-malus scale were<br />

computed on the basis <strong>of</strong> an asymptotic criterion. The implicit assumption behind the results<br />

in the preceding chapters is thus that the Markov process reaches its steady state after a<br />

relatively short period, as is the case for the −1/top bonus-malus scale for instance. If a<br />

majority <strong>of</strong> the policies are far from the steady state, it seems desirable to modify the criterion<br />

so as to take into account the rating error for new policies and for policies <strong>of</strong> a moderate<br />

age as well.<br />

8.1.2 A Practical Example: Creating a Special Scale for New Entrants<br />

Before entering into the mathematical developments, let us describe a concrete situation<br />

where the asymptotic criterion used in the previous chapters is no longer relevant, and<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


294 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

where the transient regime must be considered. This concerns new entrants in a bonusmalus<br />

system (especially the young drivers). Often, age is included in the a priori<br />

ratemaking, raising the premium for young drivers (especially young males). The large<br />

premium surcharges imposed on young drivers pose social problems in many countries.<br />

As shown by Boucher & Denuit (2006), the heterogeneity is huge inside classes <strong>of</strong><br />

young drivers. Once individual factors have been accounted for (on the basis <strong>of</strong> a fixed<br />

effect model for panel data), young drivers even became less risky on average than<br />

mature and old ones in the empirical study conducted by Boucher & Denuit (2006). In<br />

fact, the vast majoriy <strong>of</strong> claims reported by young drivers is concentrated on just a few<br />

policies.<br />

In addition to the severe explicit penalties contained in the a priori tariff, young drivers<br />

enter the bonus-malus scale far above the average level they should occupy given their<br />

annual expected claim frequency. There is thus an implicit penalty for new drivers (added to<br />

the explicit penalty found in most commercial price lists), since the relativity corresponding<br />

to the access level <strong>of</strong> all bonus-malus systems is in every case substantially higher than the<br />

average stationary relativity. The implicit surcharge paid by newcomers can be evaluated by<br />

comparing the access level to the stationary level for the sub-population <strong>of</strong> the policyholders<br />

insured for a period <strong>of</strong> 20 years, say.<br />

Young inexperienced drivers generally cause many more accidents than the other<br />

categories <strong>of</strong> the policyholders. At the same time, classes composed <strong>of</strong> young<br />

drivers are more heterogeneous than the other ones: the numerous claims are filed<br />

by a minority <strong>of</strong> insured drivers (causing several claims per year). There are<br />

basically two ways to take this phenomenon into account when designing bonus-malus<br />

systems:<br />

• Either the more important residual heterogeneity is recognized (by a larger variance <strong>of</strong> the<br />

random effect) and particular transition rules (i.e. heavier penalties when a claim is filed)<br />

are imposed on young drivers during the first few years.<br />

• Or young drivers are first placed in a special −1/top scale, and once the bottom level is<br />

attained they are sent to the regular bonus-malus scale (entering the scale at their average<br />

stationary level).<br />

Let us follow the second approach. The bonus-malus scale is as follows: young drivers<br />

are first placed in the highest level <strong>of</strong> a −1/top scale (with six levels, say). Careful young<br />

drivers then reach level 0 in five years, and enter the regular scale at that time (the regular<br />

bonus-malus scale can be <strong>of</strong> the −1/+2 type, for instance).<br />

In such a case, the levels <strong>of</strong> the initial −1/top scale form a transient class in the<br />

Markov chain describing the trajectory <strong>of</strong> the policyholders accross the bonus-malus<br />

scales. The policyholders will all leave the initial scale sooner or later and never<br />

come back to it, so that the associated stationary probabilities are all equal to 0.<br />

Applying the asymptotic criterion to compute the relativities <strong>of</strong> such a hybrid bonusmalus<br />

system does not account for the initial scale. The transient distribution will be<br />

influenced by the −1/top scale, and should therefore be used in this case to determine<br />

the relativities associated with the −1/top initial scale and with the regular −1/+2<br />

scale.


Transient Maximum Accuracy Criterion 295<br />

8.1.3 Agenda<br />

The transient regime is discussed in Section 8.2, where the convergence <strong>of</strong> bonus-malus<br />

systems is analysed. The modified criterion with a quadratic loss function is presented in<br />

Section 8.3. The exponential loss function is briefly discussed in Section 8.4.<br />

In Section 8.5, we give the results obtained on the examples studied in the preceding<br />

chapters (former compulsory Belgian bonus-malus scale, −1/top scale and −1/+2 scale)<br />

when using the transient maximum accuracy criterion. All the results have been computed<br />

with the formulas taking the a priori ratemaking into account. We first examine the<br />

convergence to the steady state. Then we give the transient probability distributions and<br />

the transient relativities obtained when using a uniform initial distribution and a uniform<br />

distribution <strong>of</strong> the age <strong>of</strong> policy. We also compare the relativities computed using the<br />

transient maximum accuracy criterion to the relativities obtained with the help <strong>of</strong> the<br />

asymptotic maximum accuracy criterion. Finally, we give the evolution <strong>of</strong> the expected<br />

financial income.<br />

Many EU insurers recently started to compete on the basis <strong>of</strong> bonus-malus systems.<br />

Because <strong>of</strong> marketing and competition for market shares, several insurers now <strong>of</strong>fer the best<br />

level ‘for life’: provided the insured drivers reach level 0, they are allowed to stay in that<br />

level whatever the claims reported to the company. Note however that insurance companies<br />

remain free to cancel the policy after each claim. There is thus a super bonus level: the<br />

driver reaching level 0 <strong>of</strong> the scale is then allowed to ‘claim for free’. These gifts to the<br />

best drivers are in contradiction to the actuarial and economic purposes <strong>of</strong> the a posteriori<br />

ratemaking systems. They are nevertheless very efficient from the marketing point <strong>of</strong> view,<br />

to keep the best drivers in the portfolio. There is thus an absorbing state in the Markov<br />

model describing the trajectory <strong>of</strong> the driver in the bonus-malus scale. Consequently, the<br />

stationary distribution is degenerated, placing a unit probability mass in level 0. Making<br />

level 0 absorbing for the Markov chain thus forbids the use <strong>of</strong> the stationary distribution.<br />

This particular case will be considered in Section 8.6.<br />

All the numerical illustrations <strong>of</strong> this chapter are based on Portfolio A.<br />

8.2 Transient Behaviour and Convergence <strong>of</strong> Bonus-Malus Scales<br />

A method <strong>of</strong> computation <strong>of</strong> the convergence rate based on the eigenvalues <strong>of</strong> the transition<br />

matrix has been discussed in Chapter 4. Here, we examine the evolution <strong>of</strong> the total variation<br />

distance between the transient distribution p n<br />

l<br />

l = 0 1s and the stationary<br />

distribution l l = 0 1s:<br />

d TV p n =<br />

s∑<br />

l=0<br />

p n<br />

l<br />

− l n= 0 1 2<br />

for some given expected annual claim frequency . Considering a policyholder, picked at<br />

random in the portfolio, let us denote as L n the level occupied by this policyholder in the<br />

bonus-malus scale after n years, and as L the level occupied once the stationary regime has<br />

been reached. The convergence can thus be assessed with<br />

d TV p n =<br />

s∑<br />

∣ PrL n = l − PrL = l∣<br />

l=0


296 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

=<br />

s∑<br />

∣ ∑ k<br />

l=0<br />

∫ + (<br />

w k<br />

0<br />

p n<br />

l<br />

)<br />

∣<br />

k − l k dF ∣ (8.1)<br />

for n = 0 1 2 which is the sum on each level l <strong>of</strong> the absolute difference between the<br />

probability for a policyholder to be in the level l after n periods and the probability for this<br />

policyholder to be in level l when the stationary state is reached.<br />

We can assume that the convergence to the steady state is acceptable when we reach n 0<br />

such that<br />

d TV p n 0 ≤ (8.2)<br />

for some fixed >0. The convergence could then be checked by computing and analysing<br />

the evolution <strong>of</strong> (8.1) with n.<br />

Example 8.1 (Former compulsory Belgian bonus-malus scale) The convergence <strong>of</strong> the<br />

former compulsory Belgian bonus-malus scale is assessed using the evolution <strong>of</strong><br />

C n = d TV p n <br />

displayed in Figure 8.1. We notice a fast convergence during the first 20 years (C n decreasing<br />

from 0.9 to 0.2) and then a very slow convergence to reach C n


Transient Maximum Accuracy Criterion 297<br />

8.3 Quadratic Loss Function<br />

8.3.1 Transient Maximum Accuracy Criterion<br />

In this chapter, a policyholder picked at random from the portfolio is now characterized by<br />

three random variables: , and A. As before, is the expected claim frequency derived<br />

from the a priori ratemaking and the residual effect due to the heterogeneity remaining<br />

inside each risk class. The integer-valued random variable A then represents the age <strong>of</strong> the<br />

policy in the portfolio and will enable us to take the transient behaviour <strong>of</strong> a bonus-malus<br />

system into account. The probability mass function <strong>of</strong> A is denoted as<br />

PrA = n = a n n= 1 2<br />

In words, this means that a proportion a n <strong>of</strong> the policies in the portfolio have been in force<br />

for n years.<br />

We have already seen that the assumption <strong>of</strong> the independence between and is<br />

reasonable. As for and A, it seems that they are likely to be correlated. Indeed, it seems<br />

probable that the age <strong>of</strong> the policyholder, which is included in the a priori ratemaking, is<br />

correlated with the age <strong>of</strong> the policy. However, in order to simplify the computation, we will<br />

further assume that and A are independent. We also assume the independence between <br />

and A.<br />

Recall from Chapter 4 that the sequence <strong>of</strong> levels occupied by the policyholder in the<br />

bonus-malus scale is denoted as L 0 L 1 L 2 . Here, we denote as L A the level occupied<br />

by a policyholder subject to the bonus-malus system for A years. Let us assume that the<br />

relativity applied to a policyholder picked at random from the portfolio is r A<br />

L A<br />

. For a<br />

policyholder with age <strong>of</strong> policy n, this relativity becomes r n<br />

L n<br />

. As a result, for each group <strong>of</strong><br />

policyholders with age <strong>of</strong> policy n, the goal is then to minimize the expected squared rating<br />

error<br />

[ ∣ ] [ ]<br />

Q n = E − r A<br />

L A<br />

2 ∣∣A = n = E − r n<br />

L n<br />

2<br />

The solution is given by<br />

= ∑ k<br />

w k<br />

∫ +<br />

0<br />

s∑<br />

− r n<br />

l<br />

2 p n<br />

l kdF (8.3)<br />

l=0<br />

r n<br />

l<br />

= EL A = l A = n<br />

∫ + ∑k w k p n<br />

0 l<br />

=<br />

kdF <br />

∫ + ∑k w k p n<br />

0 l kdF (8.4)<br />

We see that (4.13) is a limiting form <strong>of</strong> equation (8.4) when n tends to infinity.<br />

The asymptotic criterion Q seems reasonable when a majority <strong>of</strong> the risks are close to<br />

the steady state. In practice, however, real portfolios will <strong>of</strong>ten have a substantial fraction <strong>of</strong><br />

comparatively young policies. Then it is desirable to obtain a solution for the relativities not


298 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

only by means <strong>of</strong> Q but also by means <strong>of</strong> Q n for various finite n. This is why we introduce<br />

a new criterion based on a weighted average <strong>of</strong> the form<br />

[ ] ∑<br />

¯Q = E − r A<br />

L A<br />

2 = a n Q n<br />

n=1<br />

The solution <strong>of</strong> (8.5) is then given by<br />

k<br />

0<br />

n=1<br />

∑ ∑ ∫ + s∑<br />

= a n w k − r n<br />

l<br />

2 p n<br />

l kdF (8.5)<br />

l=0<br />

¯r l = EL A = l<br />

∑<br />

= r n<br />

l<br />

PrA = nL A = l<br />

n=1<br />

∑ <br />

n=0<br />

=<br />

a ∫ +<br />

n<br />

∑k w k p n<br />

0 l<br />

kdF <br />

∑ <br />

n=1 a ∫ +<br />

n<br />

∑k w k p n<br />

0 l kdF (8.6)<br />

Remark 8.1 Let us mention that if the insurance company does not enforce any a priori<br />

ratemaking system, all the k s are equal to and (8.6) reduces to the formula<br />

∑ <br />

n=1<br />

¯r l =<br />

a ∫ +<br />

n<br />

∑ <br />

n=1 a ∫ +<br />

n<br />

p n<br />

0 l<br />

p n<br />

0 l<br />

dF <br />

dF <br />

that has been derived in Borgan, Hoem & Norberg (1981).<br />

Example 8.2 (Former Compulsory Belgian Bonus-Malus Scale) We saw from Figure 8.1<br />

that the steady state was approached at a slow rate for the 23-level Belgian bonus-malus scale.<br />

This is why we selected 50 years in the distribution <strong>of</strong> policy age. The transient relativities<br />

presented in Table 8.2 correspond to the transient distributions displayed in Table 8.1.<br />

They have all been computed with a uniform initial distribution (PrL 0 = l = 1/23 for<br />

l = 0 122).<br />

We observe that the transient relativities slowly converge to the steady state relativities<br />

given in the last column <strong>of</strong> Table 8.2 but also that they generally underestimate these<br />

stationary relativities. This agrees with the fact that the transient probabilities <strong>of</strong> being in<br />

level 0 underestimate the steady state probability <strong>of</strong> being in this level. Therefore, the bonuses<br />

<strong>of</strong> levels 0 1 2, 3 and 4 must be greater and the maluses <strong>of</strong> levels 5 to 22 must be smaller<br />

to ensure financial balance. Moreover, we clearly see that, in the first steps <strong>of</strong> the transient<br />

behaviour, there are more levels granting a bonus in order to balance the lack (with respect<br />

to the stationary state) <strong>of</strong> policyholders in the lowest levels.<br />

In addition to the uniform initial distribution, let us now consider a ‘top’ distribution (78 %<br />

<strong>of</strong> the policyholders are concentrated in level 22 at time 0 and each other level contains 1 %<br />

<strong>of</strong> the policyholders) and a ‘bottom’ distribution (78 % <strong>of</strong> the policyholders are placed in<br />

level 0 at time 0 and the others are spread uniformly over levels 1 to 22). Let us mention<br />

that all the policyholders could not be placed in level 0 or 22 because Equation (8.4) is not<br />

defined when p n<br />

l<br />

equals 0. The results are displayed in Tables 8.3 and 8.4. We can see there


Transient Maximum Accuracy Criterion 299<br />

Table 8.1 Evolution <strong>of</strong> the uniform initial distribution for the former compulsory Belgian bonusmalus<br />

scale.<br />

Level l PrL 0 = l PrL 10 = l PrL 20 = l PrL 30 = l PrL 40 = l PrL 50 = l PrL = l<br />

22 43% 46% 51% 53% 53% 53% 54%<br />

21 43% 34% 36% 37% 38% 38% 38%<br />

20 43% 28% 28% 29% 29% 29% 29%<br />

19 43% 24% 23% 23% 23% 23% 23%<br />

18 43% 23% 21% 20% 19% 19% 19%<br />

17 43% 25% 19% 18% 17% 17% 16%<br />

16 43% 26% 18% 16% 15% 15% 15%<br />

15 43% 26% 18% 15% 14% 14% 13%<br />

14 43% 27% 17% 15% 14% 13% 13%<br />

13 43% 28% 18% 15% 13% 13% 12%<br />

12 43% 41% 20% 15% 14% 13% 12%<br />

11 43% 40% 20% 15% 14% 13% 12%<br />

10 43% 40% 20% 16% 14% 13% 12%<br />

9 43% 40% 22% 17% 15% 14% 14%<br />

8 43% 40% 23% 18% 17% 16% 16%<br />

7 43% 40% 28% 20% 18% 18% 17%<br />

6 43% 39% 28% 21% 19% 18% 18%<br />

5 43% 37% 28% 21% 19% 18% 18%<br />

4 43% 47% 45% 42% 42% 42% 43%<br />

3 43% 42% 40% 38% 38% 38% 38%<br />

2 43% 38% 44% 35% 34% 34% 34%<br />

1 43% 34% 40% 32% 31% 31% 31%<br />

0 43% 236% 393% 469% 490% 497% 503%<br />

Table 8.2 Evolution <strong>of</strong> the transient relativities for the uniform initial distribution for the former<br />

compulsory Belgian bonus-malus scale.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

r l<br />

22 00 % 2641 % 2700 % 2709 % 2712 % 2714 % 2715%<br />

21 00 % 2290 % 2419 % 2450 % 2462 % 2468 % 2474%<br />

20 00 % 2011 % 2195 % 2248 % 2269 % 2279 % 2291%<br />

19 00 % 1792 % 2008 % 2079 % 2110 % 2124 % 2141%<br />

18 00 % 1640 % 1854 % 1937 % 1974 % 1992 % 2014%<br />

17 00 % 1413 % 1704 % 1807 % 1853 % 1876 % 1903%<br />

16 00 % 1321 % 1594 % 1700 % 1750 % 1775 % 1803%<br />

15 00 % 1239 % 1499 % 1608 % 1659 % 1685 % 1714%<br />

14 00 % 1179 % 1422 % 1528 % 1578 % 1603 % 1631%<br />

13 00 % 1149 % 1364 % 1459 % 1504 % 1526 % 1549%<br />

12 00% 869 % 1253 % 1376 % 1427 % 1450 % 1472%<br />

11 00% 857 % 1201 % 1319 % 1365 % 1385 % 1402%


300 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.2<br />

(Continued).<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

r l<br />

10 00% 831 % 1151 % 1266 % 1310 % 1327 % 1340%<br />

9 00% 821 % 1107 % 1206 % 1238 % 1249 % 1255%<br />

8 00% 826 % 1073 % 1150 % 1170 % 1174 % 1173%<br />

7 00% 815% 947 % 1078 % 1109 % 1115 % 1114%<br />

6 00% 791% 913 % 1038 % 1065 % 1070 % 1067%<br />

5 00% 755% 876 % 1000 % 1026 % 1030 % 1028%<br />

4 00% 774% 809% 833% 831% 829% 826%<br />

3 00% 741% 782% 807% 806% 804% 802%<br />

2 00% 704% 648% 767% 779% 779% 778%<br />

1 00% 664% 618% 742% 754% 755% 755%<br />

0 00% 515% 438% 435% 442% 446% 451%<br />

Table 8.3 Evolution <strong>of</strong> the transient relativities for the top initial distribution for the former<br />

compulsory Belgian bonus-malus scale.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

r l<br />

22 00 % 2227 % 2488 % 2587 % 2639 % 2668 % 2715%<br />

21 00 % 1952 % 2217 % 2328 % 2387 % 2421 % 2474%<br />

20 00 % 1749 % 2012 % 2131 % 2195 % 2233 % 2291%<br />

19 00 % 1592 % 1850 % 1973 % 2041 % 2081 % 2141%<br />

18 00 % 1470 % 1719 % 1843 % 1913 % 1954 % 2014%<br />

17 00% 946 % 1330 % 1546 % 1678 % 1763 % 1903%<br />

16 00 % 1093 % 1431 % 1594 % 1684 % 1735 % 1803%<br />

15 00 % 1031 % 1354 % 1515 % 1603 % 1652 % 1714%<br />

14 00% 980 % 1288 % 1446 % 1531 % 1577 % 1631%<br />

13 00% 942 % 1234 % 1385 % 1464 % 1504 % 1549%<br />

12 00% 460% 890 % 1116 % 1256 % 1344 % 1472%<br />

11 00% 857 % 1026 % 1222 % 1317 % 1361 % 1402%<br />

10 00% 831% 987 % 1177 % 1266 % 1305 % 1340%<br />

9 00% 821% 954 % 1126 % 1199 % 1229 % 1255%<br />

8 00% 826% 926 % 1074 % 1130 % 1152 % 1173%<br />

7 00% 815% 589% 858 % 1001 % 1069 % 1114%<br />

6 00% 791% 746% 944 % 1013 % 1039 % 1067%<br />

5 00% 755% 722% 911% 975% 999 % 1028%<br />

4 00% 774% 711% 743% 774% 794% 826%<br />

3 00% 741% 685% 719% 751% 770% 802%<br />

2 00% 704% 311% 632% 748% 776% 778%<br />

1 00% 664% 618% 651% 691% 717% 755%<br />

0 00% 515% 438% 361% 404% 425% 451%


Transient Maximum Accuracy Criterion 301<br />

Table 8.4 Evolution <strong>of</strong> the transient relativities for the bottom initial distribution for the former<br />

compulsory Belgian bonus-malus scale.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

r l<br />

22 00 % 3049 % 2902 % 2818 % 2774 % 2750 % 2715%<br />

21 00 % 2681 % 2629 % 2566 % 2529 % 2508 % 2474%<br />

20 00 % 2429 % 2402 % 2361 % 2334 % 2318 % 2291%<br />

19 00 % 2075 % 2240 % 2215 % 2189 % 2172 % 2141%<br />

18 00 % 1901 % 2071 % 2066 % 2050 % 2038 % 2014%<br />

17 00 % 1776 % 1929 % 1933 % 1926 % 1919 % 1903%<br />

16 00 % 1813 % 1841 % 1827 % 1819 % 1814 % 1803%<br />

15 00 % 1866 % 1788 % 1745 % 1729 % 1722 % 1714%<br />

14 00 % 1408 % 1607 % 1633 % 1637 % 1636 % 1631%<br />

13 00 % 1482 % 1566 % 1562 % 1557 % 1554 % 1549%<br />

12 00 % 1341 % 1513 % 1499 % 1486 % 1480 % 1472%<br />

11 00 % 1426 % 1493 % 1455 % 1430 % 1417 % 1402%<br />

10 00 % 1486 % 1474 % 1419 % 1384 % 1364 % 1340%<br />

9 00 % 1153 % 1293 % 1285 % 1272 % 1264 % 1255%<br />

8 00 % 1223 % 1263 % 1231 % 1206 % 1192 % 1173%<br />

7 00 % 1255 % 1203 % 1185 % 1158 % 1140 % 1114%<br />

6 00 % 1268 % 1181 % 1152 % 1120 % 1099 % 1067%<br />

5 00 % 1270 % 1159 % 1122 % 1087 % 1064 % 1028%<br />

4 00% 994% 920% 884% 861% 847% 826%<br />

3 00% 981% 901% 863% 839% 824% 802%<br />

2 00% 966% 848% 838% 816% 801% 778%<br />

1 00% 950% 827% 816% 794% 779% 755%<br />

0 00% 566% 498% 475% 465% 460% 451%<br />

that even if the ultimate distribution <strong>of</strong> the policyholders in the scale is the same whatever<br />

the initial distributions, the transient relativities are affected by these distributions. Starting<br />

with a uniform distribution <strong>of</strong> the policyholders in the scale (Table 8.2), the r n<br />

l<br />

s increase<br />

to the r l s for levels 1 to 22, and they decrease to the limit for l = 0. The same phenomenon<br />

arises when starting with the top distribution (Table 8.3), but the difference between the<br />

r 10<br />

l<br />

s and the asymptotic r l s is now larger. On the contrary, if the bottom distribution is used<br />

(Table 8.4) then the r n<br />

l<br />

s decrease to their limit r l .<br />

The relativities ¯r l computed using (8.6) are given in Table 8.5 according to the initial<br />

distribution <strong>of</strong> the policyholders in the scale and for a uniform distribution <strong>of</strong> age <strong>of</strong> policy<br />

over 50 years, that is, a n = 1/50 for n = 150. For the sake <strong>of</strong> completeness, the steady<br />

state relativities are displayed in the last column. When a uniform initial distribution <strong>of</strong> the<br />

policyholders is used, the relativities based on the transient maximum accuracy criterion are<br />

smaller than the steady state relativities, except for level 0. This is the case for all levels if<br />

the top distribution is assumed. On the contrary, if the bottom distribution is used then the<br />

¯r l s are larger than the corresponding r l s.<br />

In order to figure out the impact <strong>of</strong> the maturity <strong>of</strong> the portfolio on the relativities,<br />

we considered two alternative age structures to the uniform age <strong>of</strong> policy distribution<br />

used so far. The three distributions (henceforth referred to as mature, young and old


302 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.5 Relativities computed on the basis <strong>of</strong> the transient maximum accuracy criterion for the<br />

former compulsory Belgian bonus-malus scale, and for different initial distributions.<br />

Level l Uniform Top Bottom Steady state<br />

distribution ¯r l distribution ¯r l distribution¯r l distribution r l<br />

22 2661 % 2411 % 2835 % 2715%<br />

21 2344 % 1854 % 2552 % 2474%<br />

20 2086 % 1640 % 2326 % 2291%<br />

19 1874 % 1475 % 2137 % 2141%<br />

18 1701 % 1345 % 1983 % 2014%<br />

17 1561 % 1242 % 1856 % 1903%<br />

16 1448 % 1157 % 1750 % 1803%<br />

15 1356 % 1088 % 1660 % 1714%<br />

14 1281 % 1029 % 1589 % 1631%<br />

13 1219% 980 % 1537 % 1549%<br />

12 1166% 937 % 1483 % 1472%<br />

11 1118% 898 % 1426 % 1402%<br />

10 1075% 864 % 1371 % 1340%<br />

9 1038% 835 % 1337 % 1255%<br />

8 1001% 808 % 1287 % 1173%<br />

7 963% 781 % 1228 % 1114%<br />

6 927% 753 % 1172 % 1067%<br />

5 892% 727 % 1120 % 1028%<br />

4 818% 689% 997% 826%<br />

3 780% 658% 941% 802%<br />

2 745% 630% 893% 778%<br />

1 713% 602% 850% 755%<br />

0 458% 401% 525% 451%<br />

portfolios, respectively) are summarized in Table 8.6. Table 8.7 displays the bonus-malus<br />

relativities obtained with these different age structures and a uniform initial distribution <strong>of</strong><br />

the policyholders in the bonus-malus scale. We see that the older the portfolio, the closer<br />

the ¯r l s to the corresponding r l s.<br />

8.3.2 Linear Scales<br />

Sometimes, it is desirable to have the same relative penalty associated with each level. To this<br />

end, the actuary can linearize the ¯r l s, as suggested by Gilde & Sundt (1989). The optimal<br />

linear relativity ¯r<br />

l<br />

lin = +l, l = 0 1s, in the transient case is thus the solution <strong>of</strong> the<br />

minimization <strong>of</strong><br />

E [ − r A<br />

L A<br />

2] = E [ − − L A 2] <br />

It is easy to check that the solution <strong>of</strong> this optimization problem is<br />

= CL A<br />

VL A <br />

and = E − CL A<br />

EL<br />

VL A A


Transient Maximum Accuracy Criterion 303<br />

Table 8.6<br />

Values <strong>of</strong> the a n s for three portfolios with different maturities.<br />

Age <strong>of</strong> policy Mature portfolio Young portfolio Old portfolio<br />

1 2 % 10 % 0.5 %<br />

2 2 % 10 % 0.5 %<br />

3 2 % 5 % 0.5 %<br />

4 2 % 5 % 0.5 %<br />

5 2 % 5 % 0.5 %<br />

6 2 % 5 % 0.5 %<br />

7 2% 5% 1%<br />

8 2 % 2.5 % 1 %<br />

9 2 % 2.5 % 1 %<br />

10 2 % 2.5 % 1 %<br />

11 2 % 2.5 % 1 %<br />

12 2 % 2.5 % 1 %<br />

13 2 % 2.5 % 1 %<br />

14 2 % 2.5 % 1 %<br />

15 2 % 2.5 % 1 %<br />

16 2 % 2.5 % 1 %<br />

17 2 % 2.5 % 1 %<br />

18 2% 1% 1%<br />

19 2% 1% 1%<br />

20 2% 1% 1%<br />

21 2% 1% 1%<br />

22 2% 1% 1%<br />

23 2% 1% 1%<br />

24 2% 1% 1%<br />

25 2% 1% 1%<br />

26 2% 1% 1%<br />

27 2% 1% 1%<br />

28 2% 1% 1%<br />

29 2% 1% 1%<br />

30 2% 1% 1%<br />

31 2% 1% 1%<br />

32 2% 1% 1%<br />

33 2% 1% 1%<br />

34 2 % 1 % 2.5 %<br />

35 2 % 1 % 2.5 %<br />

36 2 % 1 % 2.5 %<br />

37 2 % 1 % 2.5 %<br />

38 2 % 1 % 2.5 %<br />

39 2 % 1 % 2.5 %<br />

40 2 % 1 % 2.5 %<br />

41 2 % 1 % 2.5 %<br />

42 2 % 1 % 2.5 %<br />

43 2 % 1 % 2.5 %<br />

44 2% 1% 5%<br />

45 2 % 0.5 % 5 %<br />

46 2 % 0.5 % 5 %


304 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.6<br />

(Continued).<br />

Age <strong>of</strong> policy Mature portfolio Young portfolio Old portfolio<br />

47 2 % 05% 5%<br />

48 2 % 05% 5%<br />

49 2 % 05 % 10 %<br />

50 2 % 05 % 10 %<br />

Table 8.7 Relativities computed on the basis <strong>of</strong> the transient maximum accuracy criterion for the<br />

former compulsory Belgian bonus-malus scale, and for different maturities <strong>of</strong> the portfolio.<br />

Level l Mature Young Old Steady state<br />

portfolio ¯r l portfolio ¯r l portfolio ¯r l distribution r l<br />

22 2661 % 2536 % 2697 % 2715%<br />

21 2344 % 2080 % 2426 % 2474%<br />

20 2086 % 1738 % 2210 % 2291%<br />

19 1874 % 1529 % 2030 % 2141%<br />

18 1701 % 1382 % 1876 % 2014%<br />

17 1561 % 1275 % 1744 % 1903%<br />

16 1448 % 1197 % 1631 % 1803%<br />

15 1356 % 1136 % 1529 % 1714%<br />

14 1281 % 1097 % 1442 % 1631%<br />

13 1219 % 1064 % 1365 % 1549%<br />

12 1166 % 1035 % 1297 % 1472%<br />

11 1118 % 1008 % 1238 % 1402%<br />

10 1075% 982 % 1186 % 1340%<br />

9 1038% 965 % 1131 % 1255%<br />

8 1001% 942 % 1078 % 1173%<br />

7 963% 914 % 1032 % 1114%<br />

6 927% 888% 991 % 1067%<br />

5 892% 863% 954 % 1028%<br />

4 818% 844% 822% 826%<br />

3 780% 792% 790% 802%<br />

2 745% 754% 761% 778%<br />

1 713% 725% 733% 755%<br />

0 458% 505% 448% 451%<br />

The linear premium scale is thus <strong>of</strong> the form<br />

where<br />

r lin<br />

l<br />

= E + CL A<br />

l − EL<br />

VL A <br />

A <br />

CL A =<br />

+∑<br />

a n<br />

n=1<br />

s∑<br />

l=0<br />

l ∑ k<br />

w k<br />

∫ +<br />

0<br />

p n<br />

l<br />

kdF − EL A E


Transient Maximum Accuracy Criterion 305<br />

EL A =<br />

VL A =<br />

∑<br />

a n<br />

n=1 l=0 k<br />

∑<br />

8.3.3 Financial Balance<br />

a n<br />

n=1 l=0 k<br />

s∑ ∑ ∫ +<br />

w k<br />

0<br />

lp n<br />

l<br />

kdF <br />

s∑ ∑ ∫ +<br />

w k l − EL A 2 p n kdF <br />

0<br />

We know from Chapter 4 that when the relativities are derived from the asymptotic criterion,<br />

the bonus-malus system is financially balanced when the steady state has been reached. The<br />

only way <strong>of</strong> keeping this financial balance during the transient behaviour <strong>of</strong> the bonusmalus<br />

scale is to change the relativities applied to the policyholders so that they are equal<br />

to the r n<br />

l<br />

s in each period n. For commercial reasons, it seems difficult to adopt such a<br />

strategy.<br />

But, when the number <strong>of</strong> levels and the transistion rules <strong>of</strong> the bonus-malus system have<br />

been fixed, it is possible to check how the expected financial income evolves with respect to<br />

the financial balance for the different periods n = 1 2 until the steady state is reached.<br />

If the relativities in force are the r l s computed on the basis <strong>of</strong> the asymptotic maximum<br />

accuracy criterion, then the expected financial income for policies with age n is<br />

s∑<br />

I n = r l PrL n = l<br />

=<br />

l=0<br />

l=0<br />

k<br />

0<br />

l<br />

s∑ ∑ ∫ +<br />

w k r l p n<br />

l kdF (8.7)<br />

where the p n<br />

l<br />

·s are computed using (4.6).<br />

Alternatively, if the relativities are the ¯r l s computed on the basis <strong>of</strong> the transient maximum<br />

accuracy criterion, then the expected financial income for policies with age n is<br />

Ī n =<br />

=<br />

s∑<br />

¯r l PrL n = l<br />

l=0<br />

s∑ ∑ ∫ +<br />

w k<br />

l=0<br />

k<br />

0<br />

¯r l p n<br />

l kdF (8.8)<br />

The evolution <strong>of</strong> the expected income, until steady state has been reached, is probably<br />

one <strong>of</strong> the most important parameters to take into account.<br />

Example 8.3 (Former Belgian Compulsory Bonus-Malus Scale) Table 8.8 displays the<br />

evolution <strong>of</strong> the expected financial income I n (computed with the steady state relativities)<br />

according to the initial distribution <strong>of</strong> the policyholders. It slowly converges to 100 %. The<br />

choice <strong>of</strong> the uniform initial distribution or the top initial distribution leads to an expected<br />

financial income greater than 100 %. Indeed, too many policyholders (with respect to the<br />

steady state situation) are in the malus levels, thus providing a greater income to the company.<br />

Conversely, with a bottom initial distribution, the expected financial income is smaller than<br />

100 % as too many policyholders are in the bonus levels.


306 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.8 Evolution <strong>of</strong> the expected financial income I n based on r l (influence <strong>of</strong><br />

the initial distribution).<br />

Age <strong>of</strong> policy Uniform Top Bottom<br />

distribution I n distribution I n distribution I n<br />

0 1465 % 2428% 684%<br />

1 1431 % 2258% 720%<br />

2 1401 % 2141% 753%<br />

3 1374 % 2055% 784%<br />

4 1349 % 1987% 812%<br />

5 1326 % 1932% 826%<br />

6 1304 % 1878% 837%<br />

7 1282 % 1828% 848%<br />

8 1261 % 1783% 860%<br />

9 1241 % 1740% 872%<br />

10 1222 % 1701% 880%<br />

11 1203 % 1663% 886%<br />

12 1186 % 1629% 892%<br />

13 1169 % 1589% 899%<br />

14 1153 % 1551% 906%<br />

15 1139 % 1521% 910%<br />

16 1125 % 1493% 914%<br />

17 1112 % 1468% 918%<br />

18 1099 % 1407% 922%<br />

19 1089 % 1383% 926%<br />

20 1079 % 1363% 929%<br />

21 1070 % 1345% 932%<br />

22 1061 % 1275% 935%<br />

23 1055 % 1243% 938%<br />

24 1050 % 1231% 942%<br />

25 1046 % 1220% 944%<br />

26 1041 % 1210% 947%<br />

27 1037 % 1176% 949%<br />

28 1034 % 1158% 952%<br />

29 1031 % 1150% 954%<br />

30 1028 % 1144% 957%<br />

31 1026 % 1137% 959%<br />

32 1023 % 1119% 960%<br />

33 1022 % 1108% 962%<br />

34 1020 % 1103% 965%<br />

35 1019 % 1099% 966%<br />

36 1017 % 1095% 968%<br />

37 1016 % 1084% 969%<br />

38 1015 % 1076% 971%<br />

39 1014 % 1073% 972%<br />

40 1013 % 1070% 974%<br />

41 1012 % 1067% 975%<br />

42 1011 % 1060% 976%<br />

43 1010 % 1055% 977%<br />

44 1009 % 1053% 978%<br />

45 1009 % 1051% 980%


Transient Maximum Accuracy Criterion 307<br />

46 1008 % 1049% 980%<br />

47 1008 % 1044% 981%<br />

48 1007 % 1041% 982%<br />

49 1007 % 1039% 983%<br />

50 1006 % 1038% 984%<br />

8.3.4 Choice <strong>of</strong> an Initial Level<br />

As the number <strong>of</strong> years increases, the influence <strong>of</strong> the initial level diminishes for the<br />

individual policy, and it vanishes in the limit. Therefore, the choice <strong>of</strong> the initial level cannot<br />

be made part <strong>of</strong> the optimizing procedure based on the asymptotic criterion.<br />

When the transient behaviour is taken into account, it is convenvient to select the optimal<br />

starting level by maximizing a measure <strong>of</strong> efficiency for the bonus-malus scale. In the<br />

transient case, the Q n -efficiency is defined as<br />

s∑<br />

e n = r n 2 PrL n = l<br />

=<br />

l=0<br />

l<br />

s∑ ∑ ∫ +<br />

w k<br />

l=0<br />

k<br />

0<br />

r n<br />

l<br />

2 p n<br />

l kdF (8.9)<br />

The Q n -efficiency is equal to the variance Vr n<br />

L n<br />

up to a constant term, since<br />

[ ] ( [ ]) 2<br />

Vr n<br />

L n<br />

= E r n<br />

L n<br />

2 − E r n<br />

L n = en − 1 (8.10)<br />

Looking for an optimum <strong>of</strong> e n , it is then equivalent to use (8.9) or (8.10). The Q-efficiency,<br />

which is closely related to the variance Vr L ,is<br />

e =<br />

=<br />

s∑<br />

r l 2 PrL = l<br />

l=0<br />

s∑ ∑ ∫ +<br />

w k r l 2 l k dF = Vr L + 1<br />

l=0<br />

k<br />

Finally, the ¯Q-efficiency, which is equivalent to V¯r LA<br />

,is<br />

ē =<br />

=<br />

0<br />

s∑<br />

¯r l 2 PrL A = l<br />

l=0<br />

∑<br />

a n<br />

n=1 l=0 k<br />

= V¯r LA<br />

+ 1<br />

s∑ ∑ ∫ +<br />

w k ¯r l 2 p n kdF <br />

0<br />

To choose the initial level, it suffices to compute the relativities ¯r l , with the help <strong>of</strong> (8.6),<br />

for each initial distribution e k , where e k is the vector with 1 in the kth entry and 0 elsewhere.<br />

After having repeated the process for each e k , k = 0 1s, we compute the ¯Q-efficiency<br />

<strong>of</strong> each solution. The optimal initial level is then the one that maximizes the ¯Q-efficiency.<br />

l


308 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

8.4 Exponential Loss Function<br />

Under an exponential loss function, the aim is to minimize the objective function<br />

[<br />

n = E exp ( − c − r n<br />

L n<br />

)]<br />

under the constraint Er n<br />

L n<br />

= E. This yields<br />

r n<br />

l<br />

= E + 1 c<br />

( [<br />

]<br />

)<br />

E ln Eexp−cL n − ln Eexp−cL n = l <br />

for l = 0 1s.<br />

Of course, there is no reason to focus on the particular nth period. Then, nonnegative<br />

weights a 1 a 2 a 3 summing to 1 representing the age distribution <strong>of</strong> the policies in the<br />

portfolio are introduced and the aim <strong>of</strong> the actuary is to minimize<br />

=<br />

+∑<br />

n=1<br />

a n n <br />

Minimizing means minimizing the expected squared rating error for a randomly chosen<br />

policy.<br />

8.5 Numerical Illustrations<br />

8.5.1 Scale −1/Top<br />

The transition rules <strong>of</strong> the −1/top scale are given in Table 4.1.<br />

Initial Distribution<br />

We have tested three different initial distributions PrL 0 = l. In the first case (uniform<br />

distribution), the policyholders are uniformly spread in the scale (16.67 % <strong>of</strong> the policyholders<br />

in each <strong>of</strong> the six levels). In the second case (top distribution), 95 % <strong>of</strong> the policyholders<br />

are concentrated in the top <strong>of</strong> the scale (and the remaining 5 % are evenly spread over<br />

levels 0 to 4). In the last case (bottom distribution), 95 % <strong>of</strong> the policyholders start<br />

from the bottom <strong>of</strong> the scale (and the remaining 5 % are evenly spread over levels<br />

1to5).<br />

Convergence <strong>of</strong> the −1/Top Scale<br />

Figure 8.2 represents the evolution <strong>of</strong> C n with n for a uniform initial distribution. It gives an<br />

idea <strong>of</strong> the speed <strong>of</strong> convergence <strong>of</strong> the −1/top scale. We clearly see that C n = 0 for n ≥ 5,<br />

i.e. that the stationary state is reached after 5 years. This was known from Chapter 4.<br />

Transient Relativities<br />

Table 8.9 gives the evolution with n <strong>of</strong> the corresponding transient relativities for the three<br />

starting distributions. We see that the transient relativities do not depend on the initial


Transient Maximum Accuracy Criterion 309<br />

Cn<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0.0<br />

0 1 2 3 4 5<br />

Age <strong>of</strong> policy<br />

Figure 8.2 Convergence <strong>of</strong> the transient policyholders’ distributions to the steady state distribution<br />

for the system −1/top.<br />

distribution <strong>of</strong> the policyholders inside the scale. This comes from the fact that the transient<br />

distributions do not depend on the initial distribution for the −1/top scale.<br />

This somewhat surprising situation can be explained as follows. The distribution <strong>of</strong> the<br />

policyholders in the −1/top scale after one year is given by<br />

⎛<br />

⎜<br />

⎝<br />

p 1<br />

0 <br />

⎞<br />

p 1<br />

1 <br />

p 1<br />

2 <br />

p 1<br />

3 <br />

p 1<br />

4 <br />

⎟<br />

⎠<br />

p 1<br />

5 <br />

⎛<br />

=<br />

⎜<br />

⎝<br />

p 0<br />

0<br />

p 0<br />

1<br />

p 0<br />

2<br />

p 0<br />

3<br />

p 0<br />

4<br />

p 0<br />

5<br />

⎞T ⎛<br />

⎞<br />

exp− 0 0 0 0 1− exp−<br />

exp− 0 0 0 0 1− exp−<br />

0 exp− 0 0 0 1− exp−<br />

0 0 exp− 0 0 1− exp−<br />

⎟ ⎜<br />

⎠ ⎝ 0 0 0 exp− 0 1− exp−⎟<br />

⎠<br />

0 0 0 0 exp− 1 − exp−<br />

⎛<br />

=<br />

⎜<br />

⎝<br />

p 0<br />

0 + p 0<br />

1 exp− ⎞<br />

⎟<br />

⎠<br />

p 0<br />

2 exp−<br />

p 0<br />

3 exp−<br />

p 0<br />

4 exp−<br />

p 0<br />

5<br />

exp−<br />

1 − exp−


310 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.9<br />

Transient relativities for the different initial distributions (−1/top scale).<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

Uniform initial distribution<br />

r 2<br />

l<br />

r 3<br />

l<br />

r 4<br />

l<br />

r 5<br />

l<br />

r l<br />

5 00 % 1812 % 1812 % 1812 % 1812 % 1812 % 1812%<br />

4 00% 882 % 1599 % 1599 % 1599 % 1599 % 1599%<br />

3 00% 882% 792 % 1439 % 1439 % 1439 % 1439%<br />

2 00% 882% 792% 720 % 1312 % 1312 % 1312%<br />

1 00% 882% 792% 720% 661 % 1209 % 1209%<br />

0 00% 882% 792% 720% 661% 612% 612%<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

r 2<br />

l<br />

Top initial distribution<br />

r 3<br />

l<br />

r 4<br />

l<br />

r 5<br />

l<br />

r l<br />

5 00 % 1812 % 1812 % 1812 % 1812 % 1812 % 1812%<br />

4 00% 882 % 1599 % 1599 % 1599 % 1599 % 1599%<br />

3 00% 882% 792 % 1439 % 1439 % 1439 % 1439%<br />

2 00% 882% 792% 720 % 1312 % 1312 % 1312%<br />

1 00% 882% 792% 720% 661 % 1209 % 1209%<br />

0 00% 882% 792% 720% 661% 612% 612%<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

Bottom initial distribution<br />

r 2<br />

l<br />

r 3<br />

l<br />

r 4<br />

l<br />

r 5<br />

l<br />

r l<br />

5 00 % 1812 % 1812 % 1812 % 1812 % 1812 % 1812%<br />

4 00% 882 % 1599 % 1599 % 1599 % 1599 % 1599%<br />

3 00% 882% 792 % 1439 % 1439 % 1439 % 1439%<br />

2 00% 882% 792% 720 % 1312 % 1312 % 1312%<br />

1 00% 882% 792% 720% 661 % 1209 % 1209%<br />

0 00% 882% 792% 720% 661% 612% 612%<br />

The corresponding relativities are obtained by (8.4). They are given by<br />

⎛<br />

⎜<br />

⎝<br />

r 1<br />

0<br />

r 1<br />

1<br />

r 1<br />

2<br />

r 1<br />

3<br />

r 1<br />

4<br />

r 1<br />

5<br />

⎛ ∫<br />

⎞<br />

+ ∑k w k p 0<br />

0 0 + p 0<br />

1 exp− ⎞<br />

kdF <br />

∫ +<br />

∑k w k p 0<br />

0 0 + p 0<br />

1 exp− kdF <br />

∫ +<br />

∑k w k p 0<br />

0 2 exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 2 exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 3 exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 3 exp− k dF <br />

=<br />

∫ +<br />

∑k w k p 0<br />

0 4 exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 4 exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 5<br />

exp− k dF <br />

∫ +<br />

∑k w k p 0<br />

0 5<br />

exp− k dF <br />

⎟<br />

⎠ ⎜ ∫ +<br />

⎟<br />

⎝<br />

∑k w k 1 − exp−<br />

0 k dF ⎠<br />

∫ + ∑k w k 1 − exp−<br />

0 k dF


Transient Maximum Accuracy Criterion 311<br />

⎛<br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

=<br />

⎜<br />

∫ +<br />

⎝<br />

∑k w k<br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k dF <br />

0<br />

1 − exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

1 − exp− k dF <br />

⎞<br />

⎟<br />

⎠<br />

We see that the relativities at time 1 do not depend on the initial distribution p 0<br />

l l =<br />

0 15. Moreover, we observe that the values r 1<br />

0 to r 1<br />

4 are all equal. Only the value<br />

<strong>of</strong> r 1<br />

5<br />

is different from the others and is equal to the steady-state value r 5 .<br />

The distribution <strong>of</strong> the policyholders in the −1/top scale after two years is then given by<br />

⎛<br />

⎜<br />

⎝<br />

p 2<br />

0 <br />

⎞<br />

p 2<br />

1 <br />

p 2<br />

2 <br />

p 2<br />

3 <br />

p 2<br />

4 <br />

⎟<br />

⎠<br />

p 2<br />

5 <br />

⎛<br />

=<br />

⎜<br />

⎝<br />

p 0<br />

0<br />

p 0<br />

1<br />

p 0<br />

2<br />

p 0<br />

3<br />

p 0<br />

4<br />

p 0<br />

5<br />

⎞T ⎛<br />

⎞<br />

exp−2 0 0 0 exp−1 − exp− 1 − exp−<br />

exp−2 0 0 0 exp−1 − exp− 1 − exp−<br />

exp−2 0 0 0 exp−1 − exp− 1 − exp−<br />

0 exp−2 0 0 exp−1 − exp− 1 − exp−<br />

⎟ ⎜<br />

⎠ ⎝ 0 0 exp−2 0 exp−1 − exp− 1 − exp− ⎟<br />

⎠<br />

0 0 0 exp−2 exp−1 − exp− 1 − exp−<br />

⎛<br />

p 0<br />

0<br />

+ p 0<br />

1<br />

+ p 0<br />

2 exp−2 ⎞<br />

p 0<br />

3<br />

exp−2<br />

p 0<br />

4<br />

exp−2<br />

=<br />

<br />

p 0<br />

5<br />

exp−2<br />

⎜<br />

⎝ exp−1 − exp− ⎟<br />

⎠<br />

1 − exp−


312 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

The corresponding relativities are obtained by<br />

⎛<br />

⎜<br />

⎝<br />

r 2<br />

0<br />

r 2<br />

1<br />

r 2<br />

2<br />

r 2<br />

3<br />

r 2<br />

4<br />

r 2<br />

5<br />

⎛ ∫ +<br />

⎞<br />

∑k w k p 0<br />

0 0<br />

+ p 0<br />

1<br />

+ p 0<br />

2 exp−2 ⎞<br />

kdF <br />

∫ +<br />

∑k w k p 0<br />

0 0<br />

+ p 0<br />

1<br />

+ p 0<br />

2 exp−2 kdF <br />

∫ +<br />

∑k w k p 0<br />

0 3<br />

exp−2 k dF <br />

∫ +<br />

∑k w k p 0<br />

0 3<br />

exp−2 k dF <br />

∫ +<br />

∑k w k p 0<br />

0 4<br />

exp−2 k dF <br />

∫ +<br />

∑k w =<br />

k p 0<br />

0 4<br />

exp−2 k dF <br />

∫ +<br />

∑k w k p 0<br />

0 5<br />

exp−2 k dF <br />

∫ +<br />

∑k w k p 0<br />

0 5<br />

exp−2 k dF <br />

∫ +<br />

∑k w k exp−<br />

0 k 1 − exp− k dF <br />

∫ +<br />

⎟<br />

∑k w<br />

⎠<br />

k exp−<br />

0 k 1 − exp− k dF <br />

⎜<br />

∫ +<br />

⎟<br />

⎝<br />

∑k w k 1 − exp−<br />

0 k dF ⎠<br />

∫ + ∑k w k 1 − exp−<br />

0 k dF <br />

⎛<br />

=<br />

∫ +<br />

∑k w k ⎜<br />

⎝<br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp−2 k dF <br />

0<br />

exp− k 1 − exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

exp− k 1 − exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

1 − exp− k dF <br />

∑k w k<br />

∫ +<br />

0<br />

1 − exp− k dF <br />

⎞<br />

<br />

⎟<br />

⎠<br />

Once again, we can see that the relativities at time 2 do not depend on the initial distribution.<br />

Now, r 2<br />

4 and r 2<br />

5<br />

are equal to the stationary relativities r 4 and r 5 , respectively, and the<br />

values r 2<br />

0 to r 2<br />

3 are equal. Similar expressions can be computed for time 3, 4 and 5 to<br />

show that the transient relativities do not depend on the initial distribution.<br />

The relativities ¯r l are displayed in Table 8.10 for the three initial distributions assuming<br />

a uniform distribution <strong>of</strong> age <strong>of</strong> policy a n = 1/5, for n = 1 to 5 . The relativies computed<br />

from the bottom initial distribution are close to the steady state relativities given in the last<br />

column (except for level 0) whereas the relativities computed from a uniform or a top initial<br />

distribution are weaker than the stationary relativities for levels 1 to 5. So we see that the<br />

initial distribution can have a great influence on the resulting ¯r l s.


Transient Maximum Accuracy Criterion 313<br />

Table 8.10 Relativies computed on the basis <strong>of</strong> the transient maximum accuracy criterion<br />

for the different initial distributions (−1/top scale).<br />

Level l Uniform Top Bottom Steady state<br />

distribution ¯r l distribution ¯r l distribution ¯r l distribution r l<br />

5 1812 % 1812 % 1812 % 1812%<br />

4 1403 % 1111 % 1583 % 1599%<br />

3 1114% 946 % 1399 % 1439%<br />

2 928% 815 % 1233 % 1312%<br />

1 815% 709 % 1051 % 1209%<br />

0 711% 632% 746% 612%<br />

Influence <strong>of</strong> the Maturity <strong>of</strong> the Portfolio<br />

Let us now examine the influence <strong>of</strong> the maturity <strong>of</strong> the portfolio. To this end, let us consider<br />

the three different distributions <strong>of</strong> the age <strong>of</strong> the policies PrA = n = a n that are displayed<br />

in the following table:<br />

Age <strong>of</strong> policy Mature Young Old<br />

portfolio portfolio portfolio<br />

1 20% 30% 10%<br />

2 20% 25% 15%<br />

3 20% 20% 20%<br />

4 20% 15% 25%<br />

5 20% 10% 30%<br />

The first one is a uniform distribution which represents a mature portfolio. The second<br />

distribution represents a relatively young portfolio or a portfolio which is growing, i.e.<br />

PrA = n decreases with n. Finally, the third distribution represents an old portfolio or a<br />

portfolio which is declining, i.e. PrA = n increases with n.<br />

Asssuming a uniform initial distribution <strong>of</strong> the policyholders in the scale, we get the ¯r l s<br />

displayed in Table 8.11.<br />

Table 8.11 Relativies computed on the basis <strong>of</strong> the transient maximum accuracy criterion<br />

for different maturities <strong>of</strong> the portfolio (-1/top scale).<br />

Level l Mature Young Old Steady state<br />

portfolio ¯r l portfolio ¯r l portfolio ¯r l distribution r l<br />

5 1812 % 1812 % 1812 % 1812%<br />

4 1403 % 1318 % 1496 % 1599%<br />

3 1114 % 1030 % 1214 % 1439%<br />

2 928% 883% 984 % 1312%<br />

1 815% 811% 819 % 1209%<br />

0 711% 744% 683% 612%


314 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Financial Balance<br />

Finally, we compare the evolution <strong>of</strong> the expected financial income in two different cases.<br />

First, Table 8.12 presents the evolution <strong>of</strong> I n (computed using (8.7)) when three different<br />

initial distributions are used. We see that, in each situation, the financial balance is reached<br />

after 5 years (the time needed to reach the steady state). We notice that the uniform and the<br />

top initial distribution ensure pr<strong>of</strong>it in the first years whereas the bottom initial distribution<br />

causes losses in the first years. Too many policyholders (with respect to the steady state<br />

situation) are in the malus levels <strong>of</strong> the scale in the first two cases whereas too many<br />

policyholders are in the bonus level in the last case.<br />

Table 8.12 also gives the evolution <strong>of</strong> the expected financial income Ī n computed using<br />

(8.8). The varying parameter is the distribution <strong>of</strong> the age <strong>of</strong> the policies. We see that this<br />

parameter has a little influence on the results. The most interesting point to notice is that the<br />

expected financial income does not converge to 100 % but goes down under 100 %. This is<br />

the result <strong>of</strong> the use <strong>of</strong> the ¯r l s.<br />

Choice <strong>of</strong> the Initial Level<br />

On the basis <strong>of</strong> the concepts presented in Section 8.3.4, we now try to find the most<br />

efficient initial level for the −1/top scale. The procedure can be summarized as follows:<br />

for each initial distribution p 0 = e k (where e k is the vector with 095 in the kth entry<br />

and 001 elsewhere), we compute the relativities ¯r l with the help <strong>of</strong> (8.6), as well as the<br />

¯Q-efficiency <strong>of</strong> each solution. The optimal initial level is then the one which maximises the<br />

¯Q-efficiency.<br />

Table 8.12 Evolution <strong>of</strong> the expected financial income I n<br />

based on the r l s and Ī n based on the ¯r l s.<br />

n Uniform Top Bottom<br />

distribution I n distribution I n distribution I n<br />

0 1330 % 1783% 655%<br />

1 1217 % 1601% 792%<br />

2 1135 % 1480% 877%<br />

3 1075 % 1393% 934%<br />

4 1032 % 1329% 972%<br />

5 1000 % 1000 % 1000%<br />

n Mature Young Old<br />

portfolio Ī n portfolio Ī n portfolio Ī n<br />

0 1131 % 1100 % 1168%<br />

1 1057 % 1035 % 1086%<br />

2 1012 % 1000 % 1031%<br />

3 987% 982% 998%<br />

4 975% 974% 981%<br />

5 969% 970% 973%


Transient Maximum Accuracy Criterion 315<br />

Table 8.13<br />

Choice <strong>of</strong> the initial class for the −1/top scale.<br />

Starting level Level <strong>of</strong> the resulting scale Efficiency ē<br />

5 4 3 2 1 0<br />

5 1812% 111% 946% 815% 709% 632% 11231<br />

4 1812 % 1583 % 1002% 867% 757% 645% 1155<br />

3 1812 % 1583 % 1399% 936% 818% 671% 11671<br />

2 1812 % 1583 % 1399 % 1233% 897% 704% 11692<br />

1 1812 % 1583 % 1399 % 1233 % 1051% 746% 11657<br />

0 1812 % 1583 % 1399 % 1233 % 1051% 746% 11657<br />

The results are given in Table 8.13. We conclude that level l = 2 is the ¯Q-optimal initial<br />

level whereas level 5 is the usual starting level <strong>of</strong> this scale. The evolution <strong>of</strong> the expected<br />

financial income I n when starting in level 2 is as follows:<br />

I 0 = 1314%<br />

I 1 = 1282%<br />

I 2 = 877%<br />

I 3 = 934%<br />

I 4 = 972%<br />

I 5 = 100 %<br />

8.5.2 −1/+2 Scale<br />

The transition rules <strong>of</strong> the −1/ + 2 bonus-malus scale are given in Table 4.2.<br />

Initial Distribution<br />

As for the −1/top scale, we have tested three different initial distributions <strong>of</strong> the policyholders<br />

in the scale. In the first case (uniform distribution), the policyholders are uniformly spread<br />

in the scale (16.67 % <strong>of</strong> the policyholders in each <strong>of</strong> the six levels). In the second case (top<br />

distribution), 95 % <strong>of</strong> the policyholders are concentrated in the top <strong>of</strong> the scale (and the<br />

remaining 5 % are spread over levels 0 to 4). In the last case (bottom distribution), 95 % <strong>of</strong><br />

the policyholders start from the bottom <strong>of</strong> the scale.<br />

Convergence <strong>of</strong> the −1/+2 Scale<br />

Figure 8.3 shows the convergence <strong>of</strong> the −1/ + 2 scale by plotting the evolution <strong>of</strong> C n . The<br />

convergence is rather fast during the first five years and then slows down. The level = 005<br />

is reached after about eight years (C 8


316 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Cn<br />

1.1<br />

1.0<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0.0<br />

0 1 2 3 4 5 6 7 8<br />

Age <strong>of</strong> policy<br />

Figure 8.3 Convergence <strong>of</strong> the transient policyholders’ distributions to the steady state distribution<br />

for the system −1/ + 2.<br />

The corresponding relativies ¯r l are presented in Table 8.15 for a uniform distribution <strong>of</strong><br />

age <strong>of</strong> policy. The last column gives the steady state relativities. We notice that the relativities<br />

are more severe when the initial distribution is closer to the bottom initial distribution.<br />

This seems reasonable since, with the bottom initial distribution, many policyholders are in<br />

the bonus level from the beginning. Therefore, the bonus must be weaker and the maluses<br />

stronger to ensure the financial balance. As for the −1/top bonus-malus scale, the relativities<br />

obtained starting from the bottom initial distribution are the closest to the stationary<br />

relativities.<br />

Influence <strong>of</strong> the Maturity <strong>of</strong> the Portfolio<br />

Let us consider the following three distributions <strong>of</strong> age <strong>of</strong> policy:<br />

Age <strong>of</strong> Mature Young Old<br />

policy portfolio portfolio portfolio<br />

1 125 % 25 % 5 %<br />

2 125 % 20 % 5 %<br />

3 125% 15% 10%<br />

4 125% 10% 10%<br />

5 125% 10% 10%<br />

6 125% 10% 15%<br />

7 125% 5% 20%<br />

8 125% 5% 25%


Transient Maximum Accuracy Criterion 317<br />

Table 8.14<br />

Evolution <strong>of</strong> the transient relativities for the different initial distributions (−1/+2 scale).<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

Uniform initial distribution<br />

r 2<br />

l<br />

r 5<br />

l<br />

r 7<br />

l<br />

r 8<br />

l<br />

r l<br />

5 00 % 1886 % 2048 % 2503 % 2631 % 2652 % 2714%<br />

4 00% 994 % 1673 % 2031 % 2088 % 2142 % 2185%<br />

3 00% 972% 935 % 1586 % 1821 % 1838 % 1925%<br />

2 00% 972% 971 % 1323 % 1379 % 1378 % 1388%<br />

1 00% 882% 863 % 1222 % 1210 % 1276 % 1286%<br />

0 00% 882% 792% 674% 678% 680% 685%<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

r 2<br />

l<br />

Top initial distribution<br />

r 5<br />

l<br />

r 7<br />

l<br />

r 8<br />

l<br />

r l<br />

5 00 % 1815 % 1820 % 2278 % 2546 % 2547 % 2714%<br />

4 00% 883 % 1602 % 1987 % 1888 % 2126 % 2185%<br />

3 00% 972% 794 % 1292 % 1772 % 1661 % 1925%<br />

2 00% 972% 971 % 1313 % 1358 % 1358 % 1388%<br />

1 00% 882% 863 % 1210 % 1082 % 1255 % 1286%<br />

0 00% 882% 792% 616% 616% 660% 685%<br />

Level l<br />

r 0<br />

l<br />

r 1<br />

l<br />

Bottom initial distribution<br />

r 2<br />

l<br />

r 5<br />

l<br />

r 7<br />

l<br />

r 8<br />

l<br />

r l<br />

5 00 % 2420 % 2817 % 2764 % 2735 % 2734 % 2714%<br />

4 00 % 1850 % 2154 % 2178 % 2180 % 2183 % 2185%<br />

3 00% 972 % 1554 % 1887 % 1910 % 1915 % 1925%<br />

2 00 % 1627 % 1448 % 1408 % 1415 % 1394 % 1388%<br />

1 00% 882 % 1447 % 1349 % 1281 % 1307 % 1286%<br />

0 00% 882% 792% 724% 708% 699% 685%<br />

Table 8.15 Relativities computed on the basis <strong>of</strong> the transient maximum accuracy criterion<br />

(influence <strong>of</strong> the initial distribution).<br />

Level l Uniform Top Bottom Steady state<br />

distribution ¯r l distribution ¯r l distribution ¯r l distribution r l<br />

5 2326 % 2071 % 2749 % 2714%<br />

4 1643 % 1231 % 2140 % 2185%<br />

3 1318 % 1074 % 1847 % 1925%<br />

2 1145% 952 % 1447 % 1388%<br />

1 994% 839 % 1317 % 1286%<br />

0 710% 634% 759% 685%


318 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.16 Relativities computed on the basis <strong>of</strong> the transient maximum accuracy<br />

criterion (influence <strong>of</strong> the maturity <strong>of</strong> the portfolio).<br />

Level l Mature Young Old Steady state<br />

portfolio ¯r l portfolio ¯r l portfolio ¯r l distribution r l<br />

5 2326 % 2186 % 2459 % 2714%<br />

4 1643 % 1440 % 1846 % 2185%<br />

3 1318 % 1164 % 1521 % 1925%<br />

2 1145 % 1066 % 1227 % 1388%<br />

1 994% 938 % 1070 % 1286%<br />

0 710% 738% 693% 685%<br />

The influence <strong>of</strong> the maturity <strong>of</strong> the portfolio is illustrated in Table 8.16. We notice that the<br />

relativities are more severe when the age <strong>of</strong> the portfolio is greater.<br />

Choice <strong>of</strong> the Initial Level<br />

We have computed the efficiency with different starting levels. Table 8.17 indicates that level<br />

2isthe ¯Q-optimal level ( ¯Q-efficiency <strong>of</strong> 1.2330). The evolution <strong>of</strong> the expected financial<br />

income when using an initial level l = 2 is as follows:<br />

I 0 = 1406%<br />

I 1 = 1413%<br />

I 2 = 1019%<br />

I 3 = 1009%<br />

I 4 = 1044%<br />

I 5 = 1000%<br />

I 6 = 1000%<br />

I 7 = 1009%<br />

I 8 = 999%<br />

Table 8.17<br />

Choice <strong>of</strong> the initial class for the −1/ + 2 scale.<br />

Starting level Level <strong>of</strong> the resulting scale Efficiency ē<br />

5 4 3 2 1 0<br />

5 2071 % 1231 % 1074% 952% 839% 634% 11532<br />

4 2167 % 1774 % 1118 % 1002% 878% 651% 11942<br />

3 2313 % 1851 % 1582 % 1059% 941% 675% 12187<br />

2 2582 % 1969 % 1675 % 1349 % 1009% 711% 12330<br />

1 2675 % 2102 % 1777 % 1404 % 1276% 749% 12321<br />

0 2749 % 2140 % 1847 % 1447 % 1317% 759% 12251


Transient Maximum Accuracy Criterion 319<br />

8.6 Super Bonus Level<br />

8.6.1 Mechanism<br />

As explained above, the companies operating in Belgium continue to use the former<br />

compulsory bonus-malus scale, despite the deregulation <strong>of</strong> the a posteriori corrections. Slight<br />

modifications have nevertheless been brought to the system. For marketing purposes, and<br />

to keep the best drivers in the portfolio, many insurance companies operating in Belgium<br />

have allowed the insured drivers reaching level 0 to ‘claim for free’: whatever the number <strong>of</strong><br />

accidents reported to the company, they stay in level 0. This makes the stationary distribution<br />

degenerate: all its probability mass is concentrated at 0. Therefore, computations must be<br />

based on the transient distribution.<br />

The transition probabilities for the Belgian bonus-malus scale with a super bonus level 0<br />

are the same as before, except for the line corresponding to level 0, which is replaced with<br />

a line <strong>of</strong> 0s and a single 1 on the diagonal.<br />

8.6.2 Initial Distributions<br />

We use the following initial distributions: the uniform distribution (1/23 <strong>of</strong> the portfolio in<br />

each level), the top distribution (78 % <strong>of</strong> the policyholders are concentrated in level 22 and<br />

the remaining 22 % are spread over levels 0 to 21) and the steady-state distribution displayed<br />

in Table 8.1. Starting with the stationary distribution gives an idea <strong>of</strong> the influence <strong>of</strong> the<br />

introduction <strong>of</strong> a super bonus level in an existing bonus-malus scale.<br />

8.6.3 Transient Relativities<br />

The results for the uniform initial distribution are given in Tables 8.18 (transient distributions)<br />

and 8.19 (corresponding relativities). We note that after 50 years, the majority <strong>of</strong> the<br />

policyholders (76.0 %) are in the super bonus level. Most <strong>of</strong> the other policyholders are in<br />

the upper levels. They correspond to the very bad drivers.<br />

Concerning the transient relativities, we see that the super bonus level has a greater<br />

relativity than level 1 during the first 20 years. This side-effect <strong>of</strong> the introduction <strong>of</strong> a super<br />

bonus level is undesirable.<br />

Table 8.20 presents the transient relativities for the top initial distribution. The relativities<br />

also show particular patterns as policyholders principally move by clusters in the scale. For<br />

example, the relativities <strong>of</strong> levels 0 to 2 are not ordered.<br />

The transient relativities for the steady-state initial distribution are given in Table 8.21.<br />

Once again, the transient relativities are not always in ascending order.<br />

Table 8.22 shows the ¯r l s for the three initial distributions. The results for the steady-state<br />

initial distribution are not in ascending order. Indeed, the relativity associated with the super<br />

bonus level is larger than the relativities associated with levels 1 to 5.<br />

Therefore, we see that the introduction <strong>of</strong> a super bonus level in an actual scale leads to<br />

some undesirable side-effects: concentration <strong>of</strong> the majority <strong>of</strong> the policyholders in the super<br />

bonus level after a few years and optimal relativities no longer ordered for the lowest levels<br />

(these side-effects could be overcome with the introduction <strong>of</strong> a linear scale).


320 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.18 Evolution <strong>of</strong> the uniform initial distribution for the former Belgian compulsory bonusmalus<br />

scale with a super bonus level 0.<br />

Level l PrL 0 = l PrL 10 = l PrL 20 = l PrL 30 = l PrL 40 = l PrL 50 = l<br />

22 435 % 443 % 465 % 462 % 454 % 446 %<br />

21 435 % 328 % 334 % 327 % 318 % 309 %<br />

20 435 % 268 % 259 % 248 % 238 % 230 %<br />

19 435 % 239 % 215 % 200 % 189 % 180 %<br />

18 435 % 225 % 188 % 169 % 157 % 147 %<br />

17 435 % 245 % 175 % 150 % 135 % 124 %<br />

16 435 % 248 % 164 % 135 % 119 % 107 %<br />

15 435 % 251 % 157 % 124 % 106 % 095 %<br />

14 435 % 259 % 155 % 118 % 098 % 085 %<br />

13 435 % 265 % 154 % 113 % 091 % 078 %<br />

12 435 % 392 % 168 % 113 % 088 % 073 %<br />

11 435 % 385 % 168 % 110 % 083 % 068 %<br />

10 435 % 376 % 168 % 107 % 079 % 063 %<br />

9 435 % 372 % 169 % 105 % 076 % 059 %<br />

8 435 % 365 % 169 % 102 % 072 % 055 %<br />

7 435 % 355 % 207 % 109 % 072 % 053 %<br />

6 435 % 344 % 205 % 106 % 068 % 049 %<br />

5 435 % 330 % 201 % 103 % 064 % 046 %<br />

4 435 % 308 % 186 % 093 % 057 % 040 %<br />

3 435 % 287 % 172 % 084 % 050 % 034 %<br />

2 435 % 267 % 249 % 093 % 049 % 032 %<br />

1 435 % 249 % 233 % 085 % 044 % 028 %<br />

0 435 % 3198 % 5439 % 6744 % 7293 % 7600 %<br />

Table 8.19 Evolution <strong>of</strong> the transient relativities for the uniform initial distribution for the former<br />

Belgian compulsory bonus-malus scale with a super bonus level 0.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

22 000 % 26207 % 26976 % 27344 % 27638 % 27889 %<br />

21 000 % 22708 % 24106 % 24676 % 25073 % 25393 %<br />

20 000 % 19913 % 21822 % 22594 % 23096 % 23483 %<br />

19 000 % 17797 % 19938 % 20875 % 21476 % 21929 %<br />

18 000 % 16238 % 18385 % 19432 % 20115 % 20628 %<br />

17 000 % 13881 % 16834 % 18109 % 18909 % 19495 %<br />

16 000 % 12885 % 15685 % 17026 % 17891 % 18529 %<br />

15 000 % 12052 % 14693 % 16081 % 17003 % 17685 %<br />

14 000 % 11568 % 13938 % 15289 % 16230 % 16941 %<br />

13 000 % 11155 % 13299 % 14606 % 15555 % 16288 %


Transient Maximum Accuracy Criterion 321<br />

12 000 % 8273 % 12051 % 13714 % 14811 % 15627 %<br />

11 000 % 8093 % 11488 % 13132 % 14248 % 15089 %<br />

10 000 % 7845 % 10951 % 12589 % 13730 % 14597 %<br />

9 000 % 7757 % 10570 % 12157 % 13289 % 14164 %<br />

8 000 % 7612 % 10198 % 11757 % 12888 % 13771 %<br />

7 000 % 7413 % 8695 % 10867 % 12245 % 13253 %<br />

6 000 % 7163 % 8334 % 10471 % 11864 % 12891 %<br />

5 000 % 6863 % 7971 % 10083 % 11495 % 12544 %<br />

4 000 % 6599 % 7672 % 9777 % 11196 % 12253 %<br />

3 000 % 6320 % 7369 % 9474 % 10904 % 11972 %<br />

2 000 % 6026 % 5355 % 8400 % 10212 % 11465 %<br />

1 000 % 5718 % 5108 % 8088 % 9906 % 11176 %<br />

0 000 % 7163 % 6314 % 6293 % 6476 % 6630 %<br />

Table 8.20 Evolution <strong>of</strong> the transient relativities for the top initial distribution for the former Belgian<br />

compulsory bonus-malus scale with a super bonus level 0.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

22 000 % 22221 % 24839 % 25892 % 26490 % 26902 %<br />

21 000 % 19477 % 22125 % 23280 % 23952 % 24420 %<br />

20 000 % 17451 % 20074 % 21298 % 22029 % 22544 %<br />

19 000 % 15898 % 18455 % 19719 % 20495 % 21049 %<br />

18 000 % 14672 % 17143 % 18426 % 19236 % 19821 %<br />

17 000 % 9441 % 13250 % 15414 % 16827 % 17847 %<br />

16 000 % 10865 % 14242 % 15920 % 16947 % 17675 %<br />

15 000 % 10237 % 13464 % 15130 % 16170 % 16914 %<br />

14 000 % 9745 % 12808 % 14450 % 15498 % 16256 %<br />

13 000 % 9322 % 12235 % 13856 % 14911 % 15681 %<br />

12 000 % 4584 % 8839 % 11105 % 12623 % 13779 %<br />

11 000 % 8093 % 10089 % 12177 % 13474 % 14391 %<br />

10 000 % 7845 % 9679 % 11741 % 13047 % 13976 %<br />

9 000 % 7757 % 9331 % 11357 % 12667 % 13605 %<br />

8 000 % 7612 % 9005 % 11005 % 12321 % 13270 %<br />

7 000 % 7413 % 5811 % 8528 % 10242 % 11495 %<br />

6 000 % 7163 % 7005 % 9500 % 11125 % 12256 %<br />

5 000 % 6863 % 6753 % 9207 % 10838 % 11980 %<br />

4 000 % 6599 % 6507 % 8952 % 10588 % 11739 %<br />

3 000 % 6320 % 6270 % 8707 % 10348 % 11508 %<br />

2 000 % 6026 % 3003 % 6363 % 8404 % 9831 %<br />

1 000 % 5718 % 5108 % 7154 % 9175 % 10560 %<br />

0 000 % 7163 % 6314 % 4542 % 5095 % 5487 %


322 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 8.21 Evolution <strong>of</strong> the transient relativities for the steady-state initial distribution for the former<br />

Belgian compulsory bonus-malus scale with a super bonus level 0.<br />

Level l<br />

r 0<br />

l<br />

r 10<br />

l<br />

r 20<br />

l<br />

r 30<br />

l<br />

r 40<br />

l<br />

r 50<br />

l<br />

22 000 % 25793 % 26951 % 27415 % 27744 % 28011 %<br />

21 000 % 22081 % 23992 % 24691 % 25134 % 25473 %<br />

20 000 % 19252 % 21680 % 22592 % 23139 % 23543 %<br />

19 000 % 17466 % 19788 % 20813 % 21460 % 21940 %<br />

18 000 % 16052 % 18299 % 19397 % 20107 % 20636 %<br />

17 000 % 12472 % 16258 % 17829 % 18756 % 19409 %<br />

16 000 % 11795 % 15198 % 16793 % 17767 % 18458 %<br />

15 000 % 11134 % 14253 % 15879 % 16904 % 17634 %<br />

14 000 % 11543 % 13902 % 15224 % 16158 % 16876 %<br />

13 000 % 11181 % 13287 % 14580 % 15521 % 16252 %<br />

12 000 % 6621 % 11164 % 13200 % 14479 % 15400 %<br />

11 000 % 6849 % 10789 % 12722 % 13989 % 14915 %<br />

10 000 % 6899 % 10357 % 12236 % 13514 % 14458 %<br />

9 000 % 8004 % 10588 % 12130 % 13235 % 14095 %<br />

8 000 % 7991 % 10209 % 11734 % 12854 % 13729 %<br />

7 000 % 7824 % 7619 % 10195 % 11793 % 12933 %<br />

6 000 % 7525 % 7520 % 9948 % 11511 % 12643 %<br />

5 000 % 7118 % 7330 % 9651 % 11202 % 12341 %<br />

4 000 % 7833 % 7660 % 9728 % 11137 % 12184 %<br />

3 000 % 7315 % 7369 % 9424 % 10854 % 11921 %<br />

2 000 % 6774 % 4242 % 7629 % 9672 % 11074 %<br />

1 000 % 6228 % 4283 % 7507 % 9493 % 10874 %<br />

0 000 % 9241 % 8794 % 8493 % 8485 % 8514 %<br />

Table 8.22 Relativies computed on the basis <strong>of</strong> the transient maximum<br />

accuracy criterion for the former Belgian compulsory bonus-malus scale<br />

with a super bonus level 0.<br />

Level l Uniform Top Steady state<br />

distribution ¯r l distribution ¯r l distribution ¯r l<br />

22 2675 % 2410 % 2652%<br />

21 2346 % 1848 % 2278%<br />

20 2079 % 1633 % 1990%<br />

19 1859 % 1468 % 1768%<br />

18 1680 % 1338 % 1595%<br />

17 1635 % 1234 % 1461%<br />

16 1418 % 1149 % 1356%<br />

15 1323 % 1078 % 1272%<br />

14 1244 % 1018 % 1204%<br />

13 1177% 966 % 1146%<br />

12 1119% 921 % 1097%<br />

11 1069% 880 % 1055%<br />

10 1024% 844 % 1017%


Transient Maximum Accuracy Criterion 323<br />

9 981% 810% 979%<br />

8 940% 778% 944%<br />

7 901% 748% 912%<br />

6 865% 718% 881%<br />

5 832% 691% 852%<br />

4 783% 656% 798%<br />

3 740% 624% 759%<br />

2 703% 595% 723%<br />

1 671% 566% 692%<br />

0 655% 514% 876%<br />

8.7 Further Reading and Bibliographic Notes<br />

There are relatively few papers dealing with the study <strong>of</strong> bonus-malus scales using transient<br />

distributions. The study <strong>of</strong> bonus-malus scales with transient distributions started with the<br />

seminal paper by Borgan, Hoem & Norberg (1981). Gilde & Sundt (1989) studied linear<br />

scales in a transient regime. The examples mentioned in this chapter (special scale for new<br />

entrants, and absorbing level 0) are extensively treated in Denuit, Maréchal, Pitrebois<br />

& Walhin (2007b).


9<br />

<strong>Actuarial</strong> Analysis <strong>of</strong> the French<br />

Bonus-Malus System<br />

9.1 Introduction<br />

As discussed in the preceding chapters, bonus-malus systems usually take the form <strong>of</strong> a scale<br />

comprising a number <strong>of</strong> levels. The policyholders move inside the scale according to the<br />

number <strong>of</strong> claims they report to the insurance company. To each level <strong>of</strong> the scale is attached<br />

a relativity (that is, a percentage, or relative premium). These relativities are applied to a<br />

base premium. Usually, bonus-malus systems may be modelled through a Markov chain,<br />

which makes mathematics easy for the actuary.<br />

France is an exception. The French law imposes on the insurers operating in France<br />

a unique bonus-malus system that is not based on a scale. Instead the French bonusmalus<br />

system uses the concept <strong>of</strong> an increase-decrease coefficient (coefficient de réductionmajoration<br />

in French, henceforth abbreviated as CRM). More precisely, the French bonusmalus<br />

system implies a malus <strong>of</strong> 25 % per claim and a bonus <strong>of</strong> 5 % per claim-free year. So<br />

each policyholder is assigned a base premium and this base premium is adapted according<br />

to the number <strong>of</strong> claims reported to the insurer, multiplying the premium by 1.25 each time<br />

an accident at fault is reported to the company, and by 0.95 per claim-free year. In the case<br />

<strong>of</strong> shared responsibility, the increase is reduced by half (12.5 % instead <strong>of</strong> 25 %). Note that<br />

these increases are applied to the previous relativity: the first claim causes the premium to<br />

pass from 100 to 125, the second increases the premium to 156, the third to 195, and so<br />

on (all the numbers are rounded down). The penalties are thus convex in the number <strong>of</strong><br />

claims reported by the driver, ensuring that the more claims are reported, the heavier they are<br />

penalized. The highest percentage is 350, and the lowest is 50 (attained after 13 consecutive<br />

claim-free years). According to the French special bonus rule, after two consecutive years<br />

without a claim, the driver goes back to the initial level 100 %. This special bonus rule is<br />

particularly generous.<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


326 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

In 1994, the European Union decreed that all its member countries had to drop their<br />

mandatory bonus-malus systems, claiming that such systems reduced competition between<br />

insurers and were in contradiction to the total rating freedom implemented by the Third<br />

Directive. However, the mandatory French system is still in force. Quite surprisingly, the<br />

European Court <strong>of</strong> Justice decided in 2004 that the mandatory bonus-malus systems <strong>of</strong><br />

France and the Grand Duchy <strong>of</strong> Luxembourg were not in contradiction to the rating freedom<br />

imposed by the European legislation. These two countries were thus allowed to stick to their<br />

respective uniform bonus-malus mechanisms.<br />

In this chapter, we show that the framework <strong>of</strong> credibility theory can be used to analyse<br />

the French bonus-malus system. Specifically, the greatest accuracy credibility approach<br />

presented in Chapter 3 is adapted to fit the CRM coefficients: the actuary resorts to a<br />

quadratic loss function but the shape <strong>of</strong> the credibility predictor is constrained ex ante to<br />

the form imposed by the French law. Let us mention that the approach developed in this<br />

chapter is not the only possible method to deal with CRM coefficients. It has been shown<br />

in Kelle (2000) that the French bonus-malus system corresponds to a scale comprising<br />

several hundreds <strong>of</strong> levels (530 levels, precisely), that can be analysed in the Markovian<br />

setting <strong>of</strong> Chapter 4. The large number <strong>of</strong> states needed is due to the malus reduction<br />

in the case <strong>of</strong> claims with shared responsibility, forcing the author to consider the pair<br />

(number <strong>of</strong> claims with whole responsibility, number <strong>of</strong> claims with partial liability) to<br />

make the computation. The form <strong>of</strong> the transition matrix is somewhat intricate and we<br />

believe that the alternative developed in this chapter <strong>of</strong>fers an appropriate treatment <strong>of</strong><br />

the CRMs.<br />

Let us now detail the contents <strong>of</strong> this chapter. In Section 9.2, we model the CRMs and<br />

we compute the parameters involved in the French bonus-malus system. We also examine<br />

whether the bonus-malus system is financially balanced or not. Some numerical applications<br />

illustrate the methodological results. Section 9.3 discusses a special rule associated with the<br />

French bonus-malus system: claims for which the policyholder is only partially liable entail<br />

a reduced penalty. The impact <strong>of</strong> this reduction is evaluated, and numerical illustrations are<br />

discussed. The final Section 9.4 concludes with bibliographic notes.<br />

9.2 French Bonus-Malus System<br />

9.2.1 <strong>Modelling</strong> <strong>Claim</strong> Frequencies<br />

We adopt here the framework <strong>of</strong> the preceding chapters. Let us pick at random a policyholder<br />

from the portfolio. We denote as N t the number <strong>of</strong> claims reported by this policyholder in<br />

period t. We assume that N t is Poisson distributed with parameter where is a random<br />

effect accounting for the heterogeneity present in the portfolio. By assumption, is a positive<br />

random variable that represents the annual mean frequency in the portfolio (or in the risk class<br />

in the case <strong>of</strong> a segmented tariff). Given = , the conditional probability mass function <strong>of</strong><br />

N t is oi. We further assume that E = 1, so that EN = . The heterogeneity present<br />

in the portfolio is described by a structure function. Formally, the structure function is the<br />

probability density function f <strong>of</strong> . Therefore, the unconditional probability mass function<br />

<strong>of</strong> N t is Poi . Furthermore, the random variables N 1 N 2 N 3 are assumed to be<br />

independent and identically distributed given the risk proneness <strong>of</strong> the policyholder. Since<br />

is unknown to the insurer, this induces serial dependence among the N t s.


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 327<br />

9.2.2 Probability Generating Functions <strong>of</strong> Random Vectors<br />

In this chapter we will use multivariate models for counting random vectors. Specifically, let<br />

us consider random vectors M = M 1 M n T valued in n . The multivariate probability<br />

mass function <strong>of</strong> M is<br />

p M k 1 k n = PrM 1 = k 1 M n = k n <br />

Throughout the chapter we will extensively use the multivariate extension <strong>of</strong> the probability<br />

generating function introduced in Chapter 1, which is defined as<br />

M z = z M 1<br />

1 ···z M n<br />

n<br />

<br />

∑ ∑<br />

= z k 1<br />

1 ···zk n<br />

n p Mk 1 k n <br />

k 1 =0<br />

k n =0<br />

Let us now point out several interesting properties <strong>of</strong> the multivariate probability generating<br />

functions. If any function that is known to be a multivariate probability generating function<br />

for a random vector M is expanded as a power series in z, then the coefficient <strong>of</strong> z k 1<br />

1 ···zk n<br />

n<br />

must be p M k 1 k n . Furthermore,<br />

• z ↦→ M zzz is the probability generating function <strong>of</strong> M 1 +···+M n ;<br />

• z ↦→ M z 00 is the probability generating function <strong>of</strong> M 1 ;<br />

• M z 1 z n = M1<br />

z 1 ··· Mn<br />

z n when the random variables M 1 M n<br />

independent.<br />

are<br />

9.2.3 CRM Coefficients<br />

We will assume that the CRM coefficients only depend on the observed number <strong>of</strong> reported<br />

claims and not on their severity. Therefore the base premium is simply multiplied by a<br />

constant (essentially the expected cost <strong>of</strong> a claim).<br />

Let t be the ‘reduction’ coefficient and t be the ‘majoration’ coefficient applying to a<br />

policyholder who has been covered for t years. The CRM coefficient for years 1 to t then<br />

becomes<br />

with<br />

where I j is defined as<br />

r t t<br />

N • I • t= 1 + t N •<br />

1 − t I •<br />

N • =<br />

t∑<br />

N j and I • =<br />

j=1<br />

t∑<br />

I j (9.1)<br />

j=1<br />

{<br />

1ifNj = 0<br />

I j =<br />

0ifN j ≥ 1


328 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

In words, N • is the total number <strong>of</strong> claims reported by the policyholder during the period<br />

0t and I • is the number <strong>of</strong> years without any claim reported to the company. Note that<br />

the CRM coefficients depend on t, so that the penalties and discounts may change for every<br />

0t period.<br />

To obtain the parameters t and t , we minimize the expected squared difference between<br />

the ‘true’ relative premium and the relative premium r t t<br />

applicable to the policyholder<br />

according to the French-type bonus-malus system. More specifically, for a policyholder<br />

observed during t years, and having filed N 1 N 2 N t claims, we aim to determine t and<br />

t so as to minimize the objective function<br />

t = − r N • I • t 2<br />

with respect to the arguments and . We therefore have to solve the first order conditions<br />

<br />

t = 0<br />

and<br />

<br />

t = 0<br />

which rewrites as<br />

⎧<br />

⎨ [ ]<br />

N • 1 + N •−1 1 − I • − 1 + <br />

N •1 − <br />

I • = 0<br />

⎩<br />

[ ]<br />

I • 1 + N • 1 − <br />

I • −1 − 1 + N • 1 − <br />

I • = 0<br />

⎧<br />

⎨ [ [ ]<br />

N • 1 + N •−1 1 − •] I = N• 1 + 2N •−1 1 − 2I •<br />

⇐⇒<br />

⎩<br />

[ I • 1 + ] N • 1 − <br />

I • −1<br />

= [ I • 1 + ] 2N • 1 − <br />

2I • −1<br />

<br />

(9.2)<br />

9.2.4 Computation <strong>of</strong> the CRMs at Time t<br />

Let us define the conditional probability generating function <strong>of</strong> the random couple N • I • <br />

given = as<br />

1 2 = N •<br />

1 I •<br />

2<br />

= <br />

The conditional independence assumption <strong>of</strong> N 1 N 2 N t allows us to write<br />

1 2 =<br />

=<br />

t∏<br />

j=1<br />

We can then rewrite the system (9.2) as<br />

⎧<br />

⎨<br />

⎩<br />

[<br />

N j<br />

1 I j<br />

2<br />

]<br />

∣<br />

∣ = <br />

(<br />

e − 2 − 1 + e −1− 1) t<br />

2 10 1 + 1 − = 10<br />

2 1 + 1 − <br />

2 01 1 + 1 − = 01<br />

2 1 + 1 −


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 329<br />

where<br />

xy a b = x y ∣<br />

s x t s t ∣∣s=at=b<br />

y<br />

xy<br />

2 a b = x y<br />

s x t y s2 t 2 ∣<br />

<br />

for x y ∈ 0 1.<br />

We can rewrite the first order conditions as<br />

∫ <br />

∣<br />

s=at=b<br />

) t−1e<br />

<br />

(e 2 − e − f d<br />

0<br />

∫ (<br />

t−1e<br />

= 1 + e 2+2 + e − 2 2+<br />

− 2) 2 f d<br />

0<br />

and<br />

∫ <br />

( ) t−1e<br />

e − e − − f d<br />

0<br />

∫ (<br />

t−1e<br />

= 1 − e 2+2 + e − 2 − 2) − f d<br />

0<br />

These equations do not possess a closed form solution as is the case for the Markovian<br />

systems studied in Chapter 3. Nevertheless, they can be solved numerically using on the one<br />

hand a numerical integration algorithm or on the other hand either an algorithm allowing us<br />

to numerically solve a system <strong>of</strong> nonlinear equations or an optimisation algorithm, depending<br />

on what type <strong>of</strong> procedure is available. In the numerical illustrations proposed in this chapter,<br />

we have used an optimisation algorithm from the SAS/IML package, trying to minimize the<br />

sum <strong>of</strong> the squared differences between the left-hand side and the right-hand side <strong>of</strong> the two<br />

equations displayed above.<br />

9.2.5 Global CRM<br />

Note that we have obtained so far a numerical solution for each t: minimizing t with<br />

respect to and gives the optimal solution t t for the period 0t. However we want<br />

to obtain a unique set <strong>of</strong> CRM coefficients. These may be obtained in the transient setting<br />

developed in the preceding chapter. To this end, let us introduce the age structure <strong>of</strong> the<br />

portfolio. Specifically, we denote as A the number <strong>of</strong> years the driver is covered by the<br />

company, and as N 1 N 2 N A the annual numbers <strong>of</strong> claims reported by this policyholder.<br />

Note that A is a random variable since we work with a policyholder picked at random from<br />

the portfolio. The idea is then to determine and so as to minimize E A . The<br />

objective function then becomes<br />

= E A =<br />

to be minimized with respect to the parameters and .<br />

∑<br />

a t t where a t = PrA = t<br />

t=1


330 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Some algebra immediately leads to the following system <strong>of</strong> equations to solve:<br />

∑<br />

a t t<br />

t=1<br />

∫ <br />

∑<br />

= 1 + a t t<br />

0<br />

t=1<br />

2 (e − e − ) t−1e f d<br />

∫ <br />

0<br />

(<br />

t−1e<br />

e 2+2 + e − 2 2+<br />

− 2) 2 f d<br />

and<br />

∑<br />

a t t<br />

t=1<br />

∫ <br />

∑<br />

= 1 − a t t<br />

0<br />

t=1<br />

( ) t−1e<br />

e − e − − f d<br />

∫ <br />

0<br />

(<br />

e 2+2 + e − 2 − 2) t−1e − f d<br />

Again, this system does not admit any closed-form solution, but can be solved numerically<br />

(using an appropriate SAS/IML optimization algorithm).<br />

Remark 9.1 Note that here, we have made an averaging with respect to the age structure<br />

<strong>of</strong> the portfolio. In the case where the portfolio is partitioned into a series <strong>of</strong> risk classes, an<br />

average with respect to the composition <strong>of</strong> the portfolio (in terms <strong>of</strong> classification variables)<br />

could also be performed. If some explanatory variables are correlated with A, care must be<br />

taken in the second averaging.<br />

9.2.6 Multivariate Panjer and De Pril Recursive Formulas<br />

Notations<br />

In Sections 9.2.7 and 9.3.3, we will need the bivariate and trivariate extensions <strong>of</strong> the<br />

Panjer algorithm that was described in Section 7.2. The present section is devoted to the<br />

presentation <strong>of</strong> this method as well as a particular case for the sum <strong>of</strong> independent and<br />

identically distributed random vectors, known as multivariate De Pril’s recursive formula.<br />

Assume independent and identically distributed realizations <strong>of</strong> possibly dependent losses<br />

X i = X i1 X ik T affected by a common event, denoted by the counting variable N . For<br />

example N may count the number <strong>of</strong> hurricanes hitting the United States and the X j s may<br />

represent the cost <strong>of</strong> the hurricane in state numbered j, j = 1k. It is natural to try to<br />

obtain the distribution <strong>of</strong> the aggregate claim:<br />

( ) T<br />

∑ N N∑<br />

S = S 1 S k T = X i1 X ik (9.3)<br />

Even when the components <strong>of</strong> X are independent, the components <strong>of</strong> S will have some<br />

positive dependence due to the common counter N .<br />

When N belongs to the Panjer family <strong>of</strong> counting random variables, let us show that a<br />

multivariate version <strong>of</strong> Panjer’s recursive formula emerges. To this end, we will use the<br />

following notations:<br />

i=1<br />

i=1


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 331<br />

• the probability mass function <strong>of</strong> S is denoted as<br />

gs = PrS 1 = s 1 S k = s k <br />

• the probability mass function <strong>of</strong> X is denoted as<br />

fx = PrX 1 = x 1 X k = x k <br />

• the difference between vectors has to be understood componentwise, that is,<br />

s − x = s 1 − x 1 s k − x k T <br />

• and, finally,<br />

(<br />

s∑ ∑ s1<br />

fx = ···<br />

s k<br />

∑<br />

x≠0<br />

x 1 =0 x k =0<br />

)<br />

fx 1 x k − f00<br />

Multivariate Panjer Algorithm<br />

We are now ready to state and prove the following result.<br />

Property 9.1 Let S be as in (9.3) with X i = X i1 X ik T , i = 1 2, independent and<br />

identically distributed, arithmetic and independent <strong>of</strong> N . Furthermore, we assume that N<br />

belongs to Panjer’s class, i.e. its probability mass function satisfies (7.7). Then, if N denotes<br />

the probability generating function <strong>of</strong> N , we have<br />

g0 = N f0 (9.4)<br />

(<br />

1<br />

s∑<br />

gs =<br />

a + b x )<br />

i<br />

gs − xfx<br />

1 − af0 s i<br />

s i ≥ 1 i= 1k (9.5)<br />

x≠0<br />

Pro<strong>of</strong><br />

From<br />

Let X · and S · be the probability generating functions <strong>of</strong> X and S, respectively.<br />

∑<br />

gs = PrN = nf ⋆n s<br />

n=0<br />

we get<br />

∑<br />

S u = PrN = n ( X u ) n<br />

from which (9.4) follows immediately. By hypothesis, one has<br />

n=0<br />

n PrN = n = an − 1 PrN = n − 1 + a + b PrN = n − 1<br />

n ≥ 1


332 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Multiplying on both sides <strong>of</strong> the equality by n−1<br />

X uu i/u i X u and summing over n = 1<br />

to +, weget<br />

u i<br />

<br />

<br />

<br />

<br />

u S u = a X uu i <br />

i u S u + a + bu i <br />

i u X u S u<br />

i<br />

Comparing identical powers <strong>of</strong> u on both sides <strong>of</strong> the equality gives:<br />

s∑<br />

s∑<br />

s i gs = a fxs i − x i gs − x + a + b fxx i gs − x<br />

x=0<br />

⇔ s i gs = as i f0gs +<br />

⇔ gs =<br />

1<br />

1 − af0<br />

x≠0<br />

x=0<br />

s∑<br />

fxgs − x ( )<br />

as i + bx i<br />

x≠0<br />

(<br />

s∑<br />

fxgs − x a + b x )<br />

i<br />

<br />

s i<br />

which ends the pro<strong>of</strong>.<br />

□<br />

The Panjer formula in dimension 1 immediately follows from Property 9.1 by putting<br />

k = 1.<br />

Multivariate De Pril Algorithm<br />

Another interesting problem is to compute the multivariate convolution <strong>of</strong> a random vector.<br />

This is actually the result we will need in the following sections. Let X i = X i1 X ik T ,<br />

i = 1 2n, be independent and identically distributed realizations <strong>of</strong> the random vector<br />

X = X 1 X k T . We want to obtain the distribution <strong>of</strong> the random vector<br />

( ) n∑ T<br />

n∑<br />

S = S 1 S k T = X i1 X ik (9.6)<br />

i=1<br />

i=1<br />

The probabiliy mass function <strong>of</strong> S is the n-fold convolution f ⋆n x. The multivariate version<br />

<strong>of</strong> De Pril’s formula provides a recursive formula to derive the distribution <strong>of</strong> S.<br />

Property 9.2 Let X i = X i1 X ik T be independent and identically distributed<br />

realizations <strong>of</strong> the random vector X defined on the nonnegative integers and with probability<br />

function such that f0>0. Then, the following recursion holds :<br />

f ⋆n 0 = f n 0<br />

f ⋆n s = 1 ( )<br />

s∑ n + 1<br />

x<br />

f0 s i − 1 f ⋆n s − xfx<br />

i<br />

s i ≥ 1 i= 1k<br />

x≠0<br />

Pro<strong>of</strong> Let us introduce an auxilliary random vector W with probability mass function<br />

h0 = 0 and<br />

hx =<br />

fx<br />

1 − f0 x > 0


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 333<br />

and the auxilliary random variable N ∼ inn 1 − f0.<br />

The probability generating function <strong>of</strong> W 11 +···+W 1N W k1 +···+W kN T is given by<br />

(<br />

n (<br />

1 − 1 − f01 − W u)<br />

= X u ) n<br />

<br />

from which we conclude that S = W 11 +···+W 1N W k1 +···+W kN T .<br />

Applying Property 9.1 with<br />

a = f0 − 1<br />

f0<br />

and b = 1 − f0 n + 1<br />

f0<br />

we get the desired result.<br />

□<br />

9.2.7 Analysis <strong>of</strong> the Financial Equilibrium <strong>of</strong> the French Bonus-Malus<br />

System<br />

An interesting property <strong>of</strong> the relativities associated with Markovian bonus-malus systems<br />

and obtained through Norberg’s least-squares criterion is that they make the bonus-malus<br />

system financially balanced, i.e. the premium income <strong>of</strong> the insurer does not increase nor<br />

decrease over time (on average). In this section, we would like to check whether or not the<br />

French-type bonus-malus system enjoys this property.<br />

More precisely, once t and t have been obtained, we would like to verify whether<br />

Er t t<br />

N • I • t is equal to 1, where N • and I • are as defined in (9.1). The computation <strong>of</strong><br />

Er t t<br />

N • I • t requires knowledge <strong>of</strong> the joint distribution <strong>of</strong> the random couple N • I • .<br />

Let us denote as<br />

fx y = PrN 1 = x I 1 = y = <br />

the joint discrete mass function <strong>of</strong> the random couple N 1 I 1 , conditional on = , and as<br />

f ⋆t x y = PrN • = x I • = y = <br />

the joint discrete mass function <strong>of</strong> the random couple N • I • defined in (9.1), conditional<br />

on = . We then have the following result.<br />

Property 9.3<br />

For fixed , the following recursive formulas<br />

g ⋆t x y = f ⋆t x t − y for 0 ≤ y ≤ t and x>0<br />

g ⋆t 0 0 = e −t<br />

−<br />

x<br />

fx 0 = e for x>0<br />

x!<br />

f0 1 = e −<br />

( )<br />

x∑ t + 1<br />

g ⋆t x y = e − 1 g ⋆t x − u y − 1gu 1 for y ≥ 1 and x ≥ 1<br />

y<br />

u=1


334 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

hold true, with the convention that the functions take the value 0 where they have not been<br />

defined.<br />

Pro<strong>of</strong><br />

It is trivial that for t = 1 we have<br />

−<br />

x<br />

fx 0 = e x>0<br />

x!<br />

f0 1 = e − <br />

As f ⋆t is the t-fold convolution <strong>of</strong> a lattice random vector, we are in a position to apply the<br />

bivariate extension <strong>of</strong> De Pril’s algorithm given in Property 9.2. As that algorithm needs a<br />

mass at the origin, we define an auxilliary probability mass function<br />

We find<br />

g ⋆t 0 0 = e −t<br />

g ⋆t x y = e ( x∑<br />

u=0<br />

+<br />

x∑<br />

u=1<br />

g ⋆t x y = f ⋆t x t − y<br />

( t + 1<br />

x u − 1 )<br />

g ⋆t x − u y − 1gu 1<br />

( t + 1<br />

x u − 1 )<br />

g ⋆t x − u ygu 0<br />

)<br />

x≥ 1<br />

( x∑ ( ) t + 1<br />

g ⋆t x y = e − 1 g ⋆t x − u y − 1gu 1<br />

u=0<br />

y<br />

)<br />

x∑<br />

+ −1g ⋆t x − u ygu 0 y≥ 1<br />

u=1<br />

Because g0 1 = 0 and gx 0 = 0 for x>0, we obtain<br />

g ⋆t 0 0 = e −t<br />

( )<br />

x∑ t + 1<br />

g ⋆t x y = e <br />

u=1<br />

x u − 1 g ⋆t x − u y − 1gu 1 x ≥ 1<br />

( )<br />

x∑ t + 1<br />

g ⋆t x y = e − 1 g ⋆t x − u y − 1gu 1 y ≥ 1<br />

y<br />

u=1<br />

Because g ⋆t x 0 = 0 for x>0, only the second recursive formula has to be used. This<br />

formula is numerically stable because t + 1/y − 1 > 0.<br />

□<br />

In order to obtain the unconditional probability mass function<br />

f ⋆t x y = PrN • = x I • = y


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 335<br />

<strong>of</strong> N • and I • , it suffices to integrate the conditional mass function f ⋆t x y with respect<br />

to the structure function f , that is,<br />

f ⋆t x y =<br />

∫ <br />

0<br />

f ⋆t x yf d x > 0 0 ≤ y ≤ t<br />

These quantities can then be used to evaluate Er t t<br />

N • I • t.<br />

9.2.8 Numerical Illustration<br />

We will assume that is Gamma distributed with probability density function given by<br />

(1.35). The parameters a and are estimated on the basis <strong>of</strong> Portfolio A, that is, ̂ = 01474<br />

and â = 0889.<br />

Table 9.1 displays, for different values <strong>of</strong> t, the coefficients t and t obtained by solving<br />

the system given in Section 9.2.4. We observe a dramatic decrease <strong>of</strong> the values <strong>of</strong> t and<br />

t over time. The last column <strong>of</strong> the table allows us to verify the financial equilibrium<br />

<strong>of</strong> the system. The total premium income first decreases to 97.61 % and then increases to<br />

107.29 % after 30 years. The discount per claim-free year decreases from 14.23 % to about<br />

1 %. Similarly, the penalty induced by each reported claim decreases from 61.45 % to 6.09 %.<br />

The a posteriori corrections are therefore considerably s<strong>of</strong>tened with time.<br />

The decrease <strong>of</strong> t and t with time t that is apparent from Table 9.1 can be explained<br />

as follows: The aim is that r t t<br />

be as close as possible to the unknown risk parameter .<br />

Since does not depend on t whereas N • and I • are almost surely nondecreasing with t,<br />

the optimal parameters t and t must decrease to compensate for the increase in N • and I • .<br />

This is why averaging over time is needed.<br />

Table 9.2 gives the CRM coefficient r t t<br />

x y = 1 + t x 1 − t y for different periods<br />

<strong>of</strong> length t and for different values <strong>of</strong> the total number <strong>of</strong> claims x. The index ty means that<br />

we have y claimsfree years during the period 0t. For the sake <strong>of</strong> comparison, Table 9.3<br />

gives the CRM coefficients obtained from classical Bayesian credibility. In this case, the<br />

a priori annual expected claim frequency is multiplied by a + x/a + t as discussed in<br />

Chapter 3. We observe some large discrepancies between the values listed in Tables 9.2<br />

and 9.3.<br />

We see from Tables 9.2 and 9.3 that the discounts awarded to the policyholders who<br />

did not report any claim (column x = 0 in Tables 9.2 and 9.3) are larger with Bayesian<br />

Table 9.1<br />

values <strong>of</strong> t.<br />

Parameters t and t and financial equilibrium for different<br />

t t t Financial equilibrium<br />

1 0.6145 0.1423 0.9761<br />

2 0.4595 0.0955 0.9985<br />

3 0.3690 0.0727 1.0001<br />

4 0.3092 0.0589 1.0099<br />

10 0.1585 0.0279 1.0431<br />

20 0.0880 0.0149 1.0638<br />

30 0.0609 0.0102 1.0729


336 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 9.2 CRM coefficients r t t<br />

x y for different values <strong>of</strong> t,<br />

x and y.<br />

ty<br />

x<br />

0 1 2 3 4 5 6<br />

1 86 % 161 % 261 % 421 % 680 % 1097 % 1771 %<br />

2.≥1 82 % 132 % 193 % 281 % 410 % 599 % 874 %<br />

2.0 213 % 311 % 454 % 662 % 967 %<br />

3.≥2 80 % 118 % 161 % 221 % 302 % 414 % 566 %<br />

3.1 174 % 238 % 326 % 446 % 610 %<br />

3.0 257 % 351 % 481 % 658 %<br />

Table 9.3 Premium update coefficients derived from Bayesian<br />

credibility in the Poisson-Gamma model.<br />

t<br />

x<br />

0 1 2 3 4 5 6<br />

1 86 % 182 % 279 % 375 % 472 % 568 % 665 %<br />

2 75 % 160 % 244 % 329 % 413 % 498 % 582 %<br />

3 67 % 142 % 217 % 292 % 367 % 442 % 518 %<br />

credibility than with CRM coefficients. From the approximate financial stability evidenced<br />

in Table 9.1, the penalties induced by CRM coefficients must therefore be s<strong>of</strong>ter compared<br />

to Bayesian credibility. For instance, policyholders who reported a single claim (column<br />

x = 1 in Tables 9.2 and 9.3) have a premium surcharge ranging from 118 to 161 % with<br />

CRM coefficients, and ranging from 142 to 182 % with Bayesian credibility. However,<br />

policyholders reporting many claims are more heavily penalized with CRM coefficients<br />

than with Bayesian credibility. This comes from the convex behaviour <strong>of</strong> the CRM<br />

coefficients, whereas Bayesian credibility corrections are linear in the past number <strong>of</strong> claims.<br />

In the Poisson-Gamma (or Negative Binomial) case, the penalties corresponding to CRM<br />

coefficients are thus convex functions <strong>of</strong> the number <strong>of</strong> claims reported in the past, whereas<br />

corrections induced by credibility mechanisms are linear in this number. Compared with<br />

credibility, the bonus-malus system grants less discounts, penalizes policyholders reporting<br />

a single claim to a lesser extent but induces more severe premium corrections for those<br />

reporting at least two claims.<br />

To obtain unique values for the CRM coefficients, we have to decide about an age<br />

structure <strong>of</strong> the policies comprised in the portfolio. Here, we take the following hypothetical<br />

distribution <strong>of</strong> the portfolio:<br />

a 1 = 10 %<br />

a 5 = 20 %<br />

a 12 = 30 %<br />

a 20 = 20 %


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 337<br />

a 25 = 10 %<br />

a 30 = 10 %<br />

a t = 0 for all other t<br />

The minimization <strong>of</strong><br />

∑<br />

E A = a t t <br />

t=1<br />

with respect to and then gives the optimal solutions<br />

= 00710 and = 00133<br />

The financial equilibrium is achieved, as the total premium income tends to 104.29 % <strong>of</strong> the<br />

initial one. When working with a weighted average <strong>of</strong> the t s, the values associated<br />

with large t play the prominent role, resulting in values for and similar to those obtained<br />

for t>20 in Table 9.1.<br />

With optimal CRM coefficients, the discount for claim-free policyholders is rather modest<br />

(1.33 % per claim-free year), but the penalty in case <strong>of</strong> a claim is also moderate (7.1 %). The<br />

large differences compared with the <strong>of</strong>ficial values <strong>of</strong> today’s bonus-malus system in France<br />

(5 % <strong>of</strong> discount per claim-free year, and 25 % increase per claim) can be explained by the<br />

fact that all the penalties are suppressed after two claim-free years according to the terms <strong>of</strong><br />

the French law, which is particularly generous.<br />

We have also tested two different sets <strong>of</strong> a t s, to study the influence <strong>of</strong> the age structure <strong>of</strong><br />

the portfolio on the optimal CRM coefficients. With the age structure <strong>of</strong> an ‘old’ portfolio,<br />

that is,<br />

the minimization <strong>of</strong> E A gives<br />

a 1 = 10 %<br />

a 5 = 10 %<br />

a 12 = 10 %<br />

a 20 = 20 %<br />

a 25 = 20 %<br />

a 30 = 30 %<br />

a t = 0 for all other t<br />

= 00658 and = 00115<br />

With the age structure <strong>of</strong> a ‘young’ portfolio, that is,<br />

a 1 = 30 %<br />

a 5 = 20 %


338 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

a 12 = 20 %<br />

a 20 = 10 %<br />

a 25 = 10 %<br />

a 30 = 10 %<br />

a t = 0 for all other t<br />

the minimization <strong>of</strong> E A gives<br />

= 00694 and = 00132<br />

The influence <strong>of</strong> the age structure on the optimal CRM coefficients is thus rather moderate.<br />

9.3 Partial Liability<br />

9.3.1 Reduced Penalty and <strong>Modelling</strong> <strong>Claim</strong> Frequencies<br />

The French bonus-malus system possesses many particular rules. This section is devoted<br />

to the study <strong>of</strong> one <strong>of</strong> them. Specifically, according to the terms <strong>of</strong> the French law, if the<br />

policyholder is partially liable for the claim then the premium is multiplied by 1.125 instead<br />

<strong>of</strong> 1.25. To take such a rule into account, we have to model the random couple N 1t N 2t <br />

where N 1t counts the number <strong>of</strong> full liability claims filed during year t and N 2t counts the<br />

number <strong>of</strong> partial liability claims filed during the same year. Clearly, N 1t + N 2t is the total<br />

number <strong>of</strong> claims N t used in the preceding section.<br />

Let q be the probability that the policyholder is only partially liable for the claim he<br />

files. Further, let us assume a Bernoulli scheme for the claim types. This ensures that,<br />

conditionally on , N 1t and N 2t are independent and both conform to the Poisson distribution<br />

(see Property 6.1). Specifically, we have now<br />

−1−q<br />

1 − qk<br />

PrN 1t = k = = e k= 0 1 2<br />

k!<br />

−q<br />

qk<br />

PrN 2t = k = = e k= 0 1 2<br />

k!<br />

The random variables N 1t and N 2t are obviously dependent if the risk proneness is<br />

unknown. The joint probability mass for the random couple N 1t N 2t is given by<br />

PrN 1t = k 1 N 2t = k 2 =<br />

∫ +<br />

This is a mixed bivariate Poisson model.<br />

0<br />

PrN 1t = k 1 = PrN 2t = k 2 = f d<br />

9.3.2 Computations <strong>of</strong> the CRMs at Time t<br />

Let us consider a policyholder covered for t years. In addition to the parameter t giving the<br />

magnitude <strong>of</strong> the penalty in case <strong>of</strong> a full-liability claim, we introduce the new parameter t


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 339<br />

giving the reduced penalty in case <strong>of</strong> a partial liability claim. Now the CRM coefficient for<br />

the time period 0t is<br />

with<br />

r t t t<br />

N 1• N 2• I 12• t= 1 + t N 1•<br />

1 + t N 2•<br />

1 − t I 12•<br />

N 1• =<br />

N 2• =<br />

I 12• =<br />

t∑<br />

N 1j<br />

j=1<br />

t∑<br />

N 2j<br />

j=1<br />

t∑<br />

j=1<br />

{ 1ifN1j = N<br />

I j with I j =<br />

2j = 0<br />

0 otherwise<br />

We will assume that t = t with fixed by the actuary. The value <strong>of</strong> describes the way<br />

a claim with full liability is penalized, compared to a claim with partial liability. Then the<br />

CRM coefficient becomes<br />

r t t<br />

N 1• N 2• I 12• t= 1 + t N 1•<br />

1 + t N 2•<br />

1 − t I 12•<br />

<br />

In order to obtain t and t we have now to minimize the objective function<br />

t = [( − r N 1• N 2• I 12• t ) 2]<br />

with respect to the parameters and . The first order conditions are<br />

⎧<br />

[ (<br />

1 − I 12• N1• 1 + N1•−1 1 + N 2• + N2• 1 + N2•−1 1 + 1•)]<br />

N<br />

⎪⎨<br />

= [ (<br />

1 − 2I 12• N1• 1 + 2N1•−1 1 + 2N 2• + N2• 1 + 2N2•−1 1 + 1•)]<br />

2N<br />

[ 1 + N 1• 1 + <br />

N 2•I12• 1 − I 12•−1 ]<br />

⎪⎩<br />

= [ 1 + 2N 1• 1 + <br />

2N 2•I12• 1 − ] 2I 12•−1<br />

<br />

Let us define<br />

[<br />

1 2 3 = N 1•<br />

1 N 2•<br />

2 I 12•<br />

3<br />

]<br />

∣<br />

∣ = <br />

to be the conditional probability generating function <strong>of</strong> the random vector N 1• N 2• I 12• <br />

given = . We clearly have that<br />

( ))<br />

1 2 3 =<br />

(e − 3 − 1 + e 1−q 1 +q 2 −1 t


340 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

It can be verified that the first order conditions are as follows:<br />

⎧<br />

2<br />

⎪⎨<br />

100 1 + 1 + 1 − + 010 1 + 1 + 1 − <br />

= 100<br />

2 1 + 1 + 1 − + 010<br />

2 1 + 1 + 1 − <br />

⎪⎩<br />

2 001 1 + 1 + 1 − = 001<br />

2 1 + 1 + 1 − <br />

where<br />

xyz a b c =<br />

x y z<br />

s x t y u z s t u ∣<br />

∣∣s=at=bu=c<br />

xyz<br />

2 a b c = x y z<br />

s x t y u z s2 t 2 u 2 <br />

∣<br />

∣<br />

s=at=bu=c<br />

for x y z ∈ 0 1. Again, numerical procedures are needed to find the solution <strong>of</strong> this<br />

optimization problem.<br />

9.3.3 Financial Equilibrium<br />

Analyzing the financial equilibrium <strong>of</strong> the system now amounts to checking whether<br />

r t t<br />

N 1• N 2• I 12• t is equal to 1 with the optimal values t and t . To this end, we<br />

need the joint distribution <strong>of</strong> the random vector N 1• N 2• I 12• . The joint probability mass<br />

function <strong>of</strong> this vector is given in the following result that extends Property 9.3 in the present<br />

setting.<br />

Property 9.4<br />

For fixed , the following recursive formulas<br />

g ⋆t x y z = f ⋆t x y t − z for 0 ≤ z ≤ t x y ≥ 0 and x + y>0<br />

g ⋆t 0 0 0 = e −t<br />

fx y 0 = e − x+y 1 − q x q y<br />

x!y!<br />

for x y ≥ 0 and x + y>0<br />

f0 0 1 = e −<br />

( )<br />

x∑ y∑ t + 1<br />

g ⋆t x y z = e <br />

− 1 g ⋆t x − u y − v z − 1gu v 1<br />

u=0 v=0<br />

z<br />

for 1 ≤ z ≤ t x y ≥ 0 and x + y>z− 1<br />

hold true with the convention that the defined functions take the value 0 where they have<br />

not been defined.<br />

Pro<strong>of</strong><br />

It is trivial that for t = 1 we have<br />

fx 0z = e −1+ x z<br />

<br />

x!z!<br />

f0 1 0 = e −1+ <br />

xz>0


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 341<br />

As f ⋆t is the t-fold convolution <strong>of</strong> a lattice random vector, we will apply the trivariate<br />

extension <strong>of</strong> De Pril’s algorithm described in Property 9.2. As this algorithm needs a mass<br />

at the origin, we define an auxilliary density function<br />

g ⋆t x y z = f ⋆t x t − y z<br />

Using similar arguments as before we obtain the following recursion<br />

( )<br />

x∑ z∑ t + 1<br />

g ⋆t x y z = e 1+<br />

− 1 g ⋆t x − u y − 1z− wgu 1w<br />

y<br />

u=0 w=0<br />

□<br />

9.3.4 Numerical Illustrations<br />

To illustrate this special case, we use the same parameters as in Section 9.2.8. We assume<br />

that 20 % <strong>of</strong> the claims concern partial liability, that is q = 02. We numerically solve the<br />

system <strong>of</strong> two equations for different values <strong>of</strong> . Table 9.4 gives the results. As before, t<br />

and t decrease with t and an averaging is needed to get a unique set <strong>of</strong> parameters.<br />

The total income <strong>of</strong> the company is not much influenced by the value <strong>of</strong> , and is quite<br />

close to the values listed in Table 9.1.<br />

To obtain unique values for the CRM coefficients, we choose the first age distribution <strong>of</strong><br />

Section 9.2.8. The minimization <strong>of</strong><br />

E A =<br />

∑<br />

a t t <br />

then gives the values displayed in Table 9.5. The same comments apply to this case.<br />

Specifically, the t s with large values <strong>of</strong> t play the prominent role, giving optimal<br />

CRM coefficients close to the values obtained with t>20 in Table 9.1.<br />

The values <strong>of</strong> the optimal CRM coefficients displayed in Table 9.5 are again much smaller<br />

than those implemented by the French law. As before, this is due to the special bonus rule<br />

<strong>of</strong> the French system (after two consecutive years without claim, the driver goes back to the<br />

initial level <strong>of</strong> 100 %).<br />

t=1<br />

Table 9.4 Parameters t and t and financial equilibrium for different values <strong>of</strong> t and .<br />

t = 15 = 20 = 25<br />

t t Financial<br />

equilibrium<br />

t t Financial<br />

equilibrium<br />

t t Financial<br />

equilibrium<br />

1 0.4336 0.1423 09747 0.3316 0.1423 0.9729 0.2674 0.1423 09714<br />

2 0.3253 0.0954 09870 0.2498 0.0953 0.9850 0.2022 0.0953 09832<br />

3 0.2616 0.0727 09984 0.2014 0.0726 0.9965 0.1633 0.0726 09946<br />

4 0.2194 0.0589 10083 0.1692 0.0589 1.0062 0.1374 0.0588 10047<br />

10 0.1128 0.0279 10419 0.0873 0.0279 1.0402 0.0711 0.0279 10386<br />

20 0.0627 0.0149 10634 0.0486 0.0149 1.0624 0.0397 0.0149 10617<br />

30 0.0435 0.0102 10725 0.0337 0.0102 1.0707 0.0276 0.0102 10712


342 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Table 9.5 Parameters and and financial equilibrium for different values <strong>of</strong> .<br />

= 15 = 20 = 25<br />

Financial<br />

equilibrium<br />

Financial<br />

equilibrium<br />

Financial<br />

equilibrium<br />

0.0507 0.0133 1.0419 0.0393 0.0133 1.0403 0.0321 0.0133 1.0393<br />

9.4 Further Reading and Bibliographic Notes<br />

This chapter is based on Pitrebois, Denuit & Walhin (2006b). Despite its apparent<br />

difference with bonus-malus scales, the French bonus-malus system can be treated as a scale<br />

with many levels. Kelle (2000) followed this route, and used a Markov chain with 530<br />

states to analyse the French system. The large amount <strong>of</strong> states needed is due to the malus<br />

reduction in the case <strong>of</strong> claims with shared responsibility, forcing the author to consider the<br />

pair (number <strong>of</strong> claims with whole responsibility, number <strong>of</strong> claims with partial liability) to<br />

make the computation.<br />

In this chapter, we did not consider all the characteristics <strong>of</strong> the bonus-malus system<br />

in force in France. We have disregarded the special bonus rule (which suppresses all<br />

the penalties after two claim-free years). The French law imposes other specific rules on<br />

insurance companies. For instance, the French bonus-malus system is such that drivers never<br />

pay more than 350 % <strong>of</strong> the base premium nor less than 50 % <strong>of</strong> the base premium. Therefore<br />

the minimization process has to be carried with an adapted CRM coefficient <strong>of</strong> the form<br />

r ∗ = max05 min35r <br />

Several simplifying assumptions can be considered to ease the numerical computations.<br />

For instance, we could work with binary annual claim numbers: either the policyholder<br />

does not report any claim or he reports a single claim to the company. Such an assumption<br />

replacing N t by minN t 1 leads to smaller discounts and higher penalties, which is a prudent<br />

strategy for the insurer.<br />

Even if the vast majority <strong>of</strong> bonus-malus systems appear as scales in which policyholders<br />

move according to their claims history, there are some exceptions (such as the system<br />

in force in France). We refer the reader to Neuhaus (1988) for another example, where<br />

the malus after a claim is expressed by a fixed monetary amount (instead <strong>of</strong> a relativity).<br />

This interesting mechanism restores some fairness in case <strong>of</strong> differentiated a priori<br />

price lists.<br />

The multivariate version <strong>of</strong> Panjer’s recursive formula has been derived by Sundt<br />

(1999) and Ambagaspitya (1999). Sundt (1999) provided a pro<strong>of</strong> based on conditional<br />

expectations whereas Ambagaspitya (1999) used a pro<strong>of</strong> based on generating functions.<br />

Sundt (1999) also showed that the following recursive formula can be used :<br />

f S s =<br />

1<br />

1 − af X 0<br />

s∑<br />

x≠0<br />

(<br />

a + b s )<br />

1 +···+s k<br />

f<br />

x 1 +···+x X xf S s − x<br />

k<br />

x > 0


<strong>Actuarial</strong> Analysis <strong>of</strong> the French Bonus-Malus System 343<br />

The multivariate version <strong>of</strong> De Pril’s recursive formula for the convolution <strong>of</strong> independent<br />

and identically distributed random vectors has been derived by Sundt (1999) as a particular<br />

case <strong>of</strong> the multivariate Panjer algorithm, and by Walhin (2001) as a particular case <strong>of</strong><br />

the multivariate version <strong>of</strong> Dhaene & Vandebroek’s (1995) recursive formula for the<br />

multivariate individual risk model. It can also be deduced from the multivariate extension<br />

<strong>of</strong> De Pril’s methodology as shown in Dickson & Waters (1999).


Bibliography<br />

Albrecht, P. (1983a). Parametric multiple regression risk models: Connections with tariffication,<br />

especially in motor insurance. Insurance: Mathematics and Economics 2, 113–117.<br />

Albrecht, P. (1983b). Parametric multiple regression risk models: Theory and statistical analysis.<br />

Insurance: Mathematics and Economics 2, 49–66.<br />

Albrecht, P. (1983c). Parametric multiple regression risk models: Some connections with IBNR.<br />

Insurance: Mathematics and Economics 2, 69–73.<br />

Albrecht, P. (1984). Laplace transforms, Mellin transforms and mixed Poisson processes.<br />

Scandinavian <strong>Actuarial</strong> Journal, 58–64.<br />

Albrecht, P. (1985). An evolutionary credibility model for claim numbers. ASTIN Bulletin 15, 1–17.<br />

Ambagaspitya, R.S. (1999). On the distributions <strong>of</strong> two classes <strong>of</strong> correlated aggregate claims.<br />

Insurance: Mathematics and Economics 24, 255–263.<br />

Andrade e Silva, J.M., & Centeno, M. (2005). A note on bonus scales. Journal <strong>of</strong> <strong>Risk</strong> and<br />

Insurance 72, 601–607.<br />

Angers, J.-F., Desjardins, D., Dionne, G., & Guertin, F. (2006). Vehicle and fleet random effects<br />

in a model <strong>of</strong> insurance rating for fleets <strong>of</strong> vehicles. ASTIN Bulletin 36, 25–77.<br />

Antonio, K., & Beirlant, J. (2007). <strong>Actuarial</strong> statistics with generalized linear mixed models.<br />

Insurance: Mathematics and Economics 40, 58–76.<br />

Arnold, B.C., Castillo, E., & Sarabia, J.M. (1999). Conditional Specification <strong>of</strong> Statistical Models.<br />

Springer, New York.<br />

Beirlant, J., Derveaux, V., De Meyer, A.M., Goovaerts, M.J., Labies, E., & Maenhoudt, B.<br />

(1991). Statistical risk evaluation applied to (Belgian) car insurance. Insurance: Mathematics and<br />

Economics 10, 289–302.<br />

Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. (2004). Statistics <strong>of</strong> Extremes: Theory and<br />

Applications. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd.<br />

Bermúdez, L., Denuit, M., & Dhaene, J. (2000). Exponential bonus-malus systems integrating a<br />

priori risk classification. Journal <strong>of</strong> <strong>Actuarial</strong> Practice 9, 84–112.<br />

Besson, J.L., & Partrat, C. (1990). Loi de Poisson inverse gaussienne et systèmes de bonus-malus.<br />

Proceedings <strong>of</strong> the Astin Colloquium, Montreux 81, 418–419.<br />

Bolancé, C., Guillén, M., & Pinquet, J. (2003). Time-varying credibility for frequency risk models:<br />

Estimation and tests for autoregressive specification on the random effect. Insurance: Mathematics<br />

and Economics 33, 273–282.<br />

Bonsdorff, H. (1992). On the convergence rate <strong>of</strong> bonus-malus systems. ASTIN Bulletin 22, 217–223.<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


346 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Bonsdorff, H. (2005). On asymptotic properties <strong>of</strong> bonus-malus systems based on the number and<br />

the size <strong>of</strong> the claims. Scandinavian <strong>Actuarial</strong> Journal, 309–320.<br />

Borgan, O., Hoem, J.M., & Norberg, R. (1981). A nonasymptotic criterion for the evaluation <strong>of</strong><br />

automobile bonus systems. Scandinavian <strong>Actuarial</strong> Journal, 165–178.<br />

Boskov, M., & Verrall, R.J. (1994). Premium rating by geographical area using spatial models.<br />

ASTIN Bulletin 24, 131–143.<br />

Boucher, J.-Ph., & Denuit, M. (2006). Fixed versus random effects in Poisson regression models<br />

for claim counts: a case study with motor insurance. ASTIN Bulletin 36, 285–301.<br />

Boucher, J.-Ph., Denuit, M., & Guillén, M. (2006). <strong>Risk</strong> classification for claim counts: Zeroinflated<br />

mixed Poisson and hurdle models. Working Paper 06-06, Institut des Sciences Actuarielles,<br />

Université Catholique de Louvain, Louvain-la-Neuve, Belgium.<br />

Brockman, M.J., & Wright, T.S. (1992). Statistical motor rating: Making effective use <strong>of</strong> your data.<br />

Journal <strong>of</strong> the Institute <strong>of</strong> Actuaries 119, 457–543.<br />

Brouhns, N., Denuit, M., Masuy, B., & Verrall, R.J. (2002). Ratemaking by geographical area in<br />

the Boskov and Verrall model: a case study using Belgian car insurance data. actu-L 2, 3–28.<br />

Brouhns, N., Guillén, M., Denuit, M., & Pinquet, J. (2003). Bonus-malus scales in segmented<br />

tariffs with stochastic migration between segments. Journal <strong>of</strong> <strong>Risk</strong> and Insurance 70, 577–599.<br />

Buch-Kromann, T. (2006). Estimation <strong>of</strong> large insurance losses: A case study. Journal <strong>of</strong> <strong>Actuarial</strong><br />

Practice 13, 191–211.<br />

Buch-Larsen, T., Nielsen, J.P., Guillén, M., & Bolancé, C. (2005). Kernel density estimation for<br />

heavy-tailed distributions using the Champernowne transformation. Statistics 39, 503–518.<br />

Bühlmann, H. (1967). Experience rating and credibility. ASTIN Bulletin 4, 199–207.<br />

Bühlmann, H. (1970). Mathematical Methods in <strong>Risk</strong> Theory. Springer, New York.<br />

Bühlmann, P., & Bühlmann, H. (1999). Selection <strong>of</strong> credibility regression models. ASTIN Bulletin<br />

29, 245–270.<br />

Bühlmann, H., & Gisler, A. (2005). A Course in <strong>Credibility</strong> Theory and its Applications. Springer,<br />

Berlin.<br />

Burnham, K.P., & Anderson, D.R. (2002). Model Selection and Multi-Model Inference: A Practical<br />

Information-Theoretic Approach. Springer, New York.<br />

Butler, P. (1993). Cost-based pricing <strong>of</strong> individual automobile risk transfer: Carmile exposure unit<br />

analysis. Journal <strong>of</strong> <strong>Actuarial</strong> Practice 1, 51–84 (with discussion).<br />

Cameron, A.C., & Trivedi, P.K. (1998). Regression Analysis <strong>of</strong> Count Data. Cambridge University<br />

Press.<br />

Carrière, J. (1993a). Nonparametric tests for mixed Poisson distributions. Insurance: Mathematics<br />

and Economics 12, 3–8.<br />

Carrière, J. (1993b). A semiparametric estimator <strong>of</strong> a risk distribution. Insurance: Mathematics and<br />

Economics 13, 75–81.<br />

Carter, M., & Van Brunt, B. (2000). The Lebesgue-Stieltjes Integral. A Practical Introduction.<br />

Springer, New York.<br />

Cebrian, A., Denuit, M., & Lambert, Ph. (2003). Generalized Pareto fit to the society <strong>of</strong> Actuaries’<br />

large claims database. North American <strong>Actuarial</strong> Journal 7, 18–36.<br />

Centeno, M., & Andrade e Silva, J.M.A. (2001). Bonus systems in an open portfolio. Insurance:<br />

Mathematics and Economics 28, 341–350.<br />

Centeno, M., & Andrade e Silva, J.M.A. (2002). Optimal bonus scales under path-dependent bonus<br />

rules. Scandinavian <strong>Actuarial</strong> Journal, 615–627.<br />

Chappell, D., & Norman, J.M. (1989). Optimal, near-optimal and rule-<strong>of</strong>-thumb claiming rules for a<br />

protected bonus vehicle insurance policy. European Journal <strong>of</strong> Operations Research 41, 151–156.<br />

Consul, P.C. (1990). A model for distributions <strong>of</strong> injuries in auto-accidents. Bulletin <strong>of</strong> the SWISS<br />

Association <strong>of</strong> Actuaries, 161–168.<br />

Cooray, K., & Ananda, M.M.A. (2005). Modeling actuarial data with a composite lognormal-Pareto<br />

model. Scandinavian <strong>Actuarial</strong> Journal, 321–334.<br />

Coutts, S. (1984). Motor premium rating. Insurance: Mathematics and Economics 3, 73–96.<br />

Cummins, D.J., Dionne, G., McDonnald, J.B., & Pritchett, M.B. (1990). Application <strong>of</strong> the GB2<br />

family <strong>of</strong> distributions in modeling insurance loss processes. Insurance: Mathematics and Economics<br />

9, 257–272.


Bibliography 347<br />

Daengdej, J., Lukose, D., & Murison, R. (1999). Using statistical models and case-based reasoning<br />

in claims prediction: Experience from a real-world problem. Knowledge-based Systems 12, 239–245.<br />

Dalgaard, P. (2002). Introductory Statistics with R. Springer, New York.<br />

Dannenburg, D.R., Kaas, R., & Goovaerts, M.J. (1996). Practical <strong>Actuarial</strong> <strong>Credibility</strong> Models.<br />

Institute <strong>of</strong> <strong>Actuarial</strong> Science and Econometrics, University <strong>of</strong> Amsterdam, Amsterdam, The<br />

Netherlands.<br />

Dean, C.B., Lawless, J.F., & Willmot, G.E. (1989). A mixed Poisson-inverse-Gaussian regression<br />

model. The Canadian Journal <strong>of</strong> Statistics 17, 171–182.<br />

De Boor, C. (1978). A Practical Guide to Splines. Springer, New York.<br />

De Leve, G., & Weeda, P.J. (1968). Driving with Markov programming. ASTIN Bulletin 5, 62–86<br />

Dellaert, N.P., Frenk, J.B.G., Kouwenhoven, A., & Van Der Laan, B.S. (1990). Optimal claim<br />

behaviour for third party liability insurances, or to claim or not to claim: that is the question.<br />

Insurance: Mathematics and Economics 9, 59–76.<br />

Dellaert, N.P., Frenk, J.B.G., & van Rijsoort, L.P. (1993). Optimal claim behaviour for vehicle<br />

damage insurance. Insurance: Mathematics and Economics 12, 225–244.<br />

Dellaert, N.P., Frenk, J.B.G., & Voskol, E. (1991). Optimal claim behaviour for third party liability<br />

insurances with perfect information. Insurance: Mathematics and Economics 10, 145–151.<br />

Denuit, M. (1997). A new distribution <strong>of</strong> Poisson-type for the number <strong>of</strong> claims. ASTIN Bulletin 27,<br />

229–242.<br />

Denuit, M. (2002). S-convex extrema, Taylor-type expansions and stochastic approximations.<br />

Scandinavian <strong>Actuarial</strong> Journal, 45–67.<br />

Denuit, M., De Vylder, F.E., & Lefèvre, Cl. (1999). Extremal generators and extremal distributions<br />

for the continuous s-convex stochastic orderings. Insurance: Mathematics and Economics 24,<br />

201–217.<br />

Denuit, M., & Dhaene, J. (2001). Bonus-Malus scales using exponential loss functions. German<br />

<strong>Actuarial</strong> Bulletin 25, 13–27.<br />

Denuit, M., Dhaene, J., Goovaerts, M.J., & Kaas, R. (2005). <strong>Actuarial</strong> Theory for Dependent<br />

<strong>Risk</strong>s: Measures, Orders and Models. John Wiley & Sons, Inc., New York.<br />

Denuit, M., & Lambert, Ph. (2001). Smoothed NPML estimation <strong>of</strong> the risk distribution underlying<br />

Bonus-Malus systems. Proceedings <strong>of</strong> the Casualty <strong>Actuarial</strong> Society 88, 142–174.<br />

Denuit, M., & Lang, S. (2004). Nonlife ratemaking with Bayesian GAM’s. Insurance: Mathematics<br />

and Economics 35, 627–647.<br />

Denuit, M., Maréchal, X., Pitrebois, S., & Walhin, J.-F. (2007a). <strong>Claim</strong>ing behaviour in motor<br />

insurance with bonus-malus systems. Working Paper, Institut des Sciences Actuarielles, UCL,<br />

Louvain-la-Neuve, Belgium.<br />

Denuit, M., Maréchal, X., Pitrebois, S., & Walhin, J.-F. (2007b). <strong>Actuarial</strong> analysis <strong>of</strong> some<br />

special rules in bonus-malus systems. Working Paper, Institut des Sciences Actuarielles, UCL,<br />

Louvain-la-Neuve, Belgium.<br />

De Pril, N. (1978). The efficiency <strong>of</strong> a Bonus-Malus system. ASTIN Bulletin 10, 59–72.<br />

De Pril, N. (1979). Optimal claim decisions for a Bonus-Malus system: A continuous approach. ASTIN<br />

Bulletin 10, 215–222.<br />

De Pril, N., & Goovaerts, M.J. (1983). Bounds for the optimal critical claim size <strong>of</strong> a bonus system.<br />

Insurance: Mathematics and Economics 2, 27–32.<br />

Der, G., & Everitt, B. (2002). A Handbook <strong>of</strong> Statistical Analyses using SAS R . Chapmann &<br />

Hall/CRC, Boca Raton.<br />

Desjardins, D., Dionne, G., & Pinquet, J. (2001). Experience rating schemes for fleets <strong>of</strong> vehicles.<br />

ASTIN Bulletin 31, 81–105.<br />

De Vylder, F.E. (1985). Non-linear regression in credibility theory. Insurance: Mathematics and<br />

Economics 4, 163–172.<br />

De Vylder, F.E. (1996). Advanced <strong>Risk</strong> Theory. A Self-Contained Introduction. Editions de<br />

l’Université Libre de Bruxelles - Swiss Association <strong>of</strong> Actuaries, Bruxelles.<br />

De Wit, G.W., & Van Eeghen, J. (1984). Rate making and society’s sense <strong>of</strong> fairness. ASTIN Bulletin<br />

14, 151–163.<br />

Dhaene, J., & Vandebroek, M. (1995). Recursions for the individual model. Insurance: Mathematics<br />

and Economics 16, 31–38.


348 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Dharmadhikari, S.W., & Joag-Dev, K. (1988). Unimodality, Convexity and Applications. Academic<br />

Press, New York.<br />

Dickson, D.C.M., & Waters, H.R. (1999). Multi-period aggregate loss distributions for a life portfolio.<br />

ASTIN Bulletin 29, 295–309.<br />

Dimakos, X.K., & Rattalma, A.F. (2002). Bayesian premium rating with latent structure.<br />

Scandinavian <strong>Actuarial</strong> Journal, 162–184.<br />

Dionne, G., Artis, M., & Guillén, M. (1996). Count data models for a credit scoring system. Journal<br />

<strong>of</strong> Empirical Finance 3, 303–325.<br />

Dionne, G., & Vanasse, C. (1989). A generalization <strong>of</strong> actuarial automobile insurance rating models:<br />

the Negative Binomial distribution with a regression component. ASTIN Bulletin 19, 199–212.<br />

Dionne, G., & Vanasse, C. (1992). Automobile insurance ratemaking in the presence <strong>of</strong> asymmetrical<br />

information. Journal <strong>of</strong> Applied Econometrics 7, 149–165.<br />

Dixon, M., Kelsey, R., & Verrall, R. (2000). Postcode insurance rating: spatial modelling and<br />

performance evaluation. Paper presented at the 4th IME Congress, Barcelona.<br />

Dufresne, F. (1988). Distribution stationnaire d’un système bonus-malus et probabilité de ruine.<br />

ASTIN Bulletin 18, 31–46.<br />

Dufresne, F. (1995). The efficiency <strong>of</strong> the Swiss Bonus-Malus system. Bulletin <strong>of</strong> the Swiss<br />

Association <strong>of</strong> Actuaries, 29–41.<br />

Elvers, E. (1991). A note on the Generalized Poisson distribution. ASTIN Bulletin 21, 167.<br />

Fahrmeir, L., Lang, S., & Spies, F. (2003). Generalized geoadditive models for insurance claims<br />

data. German <strong>Actuarial</strong> Bulletin 26, 7–23.<br />

Fahrmeir, L., & Tutz, G. (2001). Multivariate Statistical <strong>Modelling</strong> Based on Generalized Linear<br />

Models. Springer, New York.<br />

Faraway, J.J. (2006). Extending the Linear Model with R. Chapman & Hall/CRC, Boca<br />

Raton.<br />

Feller, W. (1971). An Introduction to Probability Theory and its Applications (Vol. II). John Wiley<br />

& Sons, Inc., New York.<br />

Ferreira, J. (1977). Identifying equitable insurance premiums for risk classes: an alternative to the<br />

classical approach. Lecture presented at the 23rd international meeting <strong>of</strong> the Institute <strong>of</strong> Management<br />

Sciences, Athens, Greece.<br />

Franklin, C.H. (2005). Maximum likelihood estimation. In: Encyclopedia <strong>of</strong> Social Measurement,<br />

Vol. 2, pp. 653–664. John Wiley & Sons, Inc., New York.<br />

Frees, E.W. (2003). Multivariate credibility for aggregate loss models. North American <strong>Actuarial</strong><br />

Journal 7, 13–27.<br />

Frees, E.W., & Wang, P. (2005). <strong>Credibility</strong> using copulas. North American <strong>Actuarial</strong> Journal 9,<br />

31–48.<br />

Frees, E.W., & Wang, P. (2006). Copula credibility for aggregate loss models. Insurance: Mathematics<br />

and Economics 38, 360–373.<br />

Frees, E.W., Young, V.R., & Luo, Y. (1999). A longitudinal data analysis interpretation <strong>of</strong> credibility<br />

models. Insurance: Mathematics and Economics 24, 229–247.<br />

French, E., & Jones, J.B. (2004). On the distribution and dynamics <strong>of</strong> health care costs. Journal <strong>of</strong><br />

Applied Econometrics 19, 705–722.<br />

Gerber, H.U., & Jones, D. (1975). <strong>Credibility</strong> formulas <strong>of</strong> the updating type. Transactions <strong>of</strong> the<br />

Society <strong>of</strong> Actuaries 27, 31–52.<br />

Gertensgarbe, F.W., & Werner, P.C. (1989). A method for the statistical definition <strong>of</strong> extreme-value<br />

regions and their application to meteorological time series. Zeitschrift fur Meteorologie 39, 224–226.<br />

Gilde, V., & Sundt, B. (1989). On bonus systems with credibility scales. Scandinavian <strong>Actuarial</strong><br />

Journal, 13–22.<br />

Golden, R.M. (2003). Discrepancy risk model selection test theory for comparing possibly misspecified<br />

or nonnested models. Psychometrika 68, 229–249.<br />

Goovaerts, M.J., & Hoogstad, W. (1987). <strong>Credibility</strong> Theory. Survey in <strong>Actuarial</strong> Studies, Nationale<br />

Nederlanden N.V., Rotterdam.<br />

Gossiaux, A.-M., & Lemaire, J. (1981). Méthodes d’ajustement de distributions de sinistres. Bulletin<br />

<strong>of</strong> the Swiss Association <strong>of</strong> Actuaries, 87–95.<br />

Grandell, J. (1997). Mixed Poisson Processes. Chapman & Hall, New York.


Bibliography 349<br />

Greenwood, M., & Yule, G.U. (1920). An inquiry into the nature <strong>of</strong> frequency distributions<br />

representative <strong>of</strong> multiple happenings with particular reference to the occurrence <strong>of</strong> multiple attacks<br />

<strong>of</strong> disease or <strong>of</strong> repeated accidents. Journal <strong>of</strong> the Royal Statistical Society, Series A 83, 255–279.<br />

Grenander, U. (1957a). On the heterogeneity in non-life insurance. Scandinavian <strong>Actuarial</strong> Journal,<br />

153–179.<br />

Grenander, U. (1957b). Some remarks on bonus systems in automobile insurance. Scandinavian<br />

<strong>Actuarial</strong> Journal, 180–197.<br />

Hachemeister, C.A. (1975). <strong>Credibility</strong> for regression models with application to trend. In: P.M. Kahn,<br />

Editor, <strong>Credibility</strong>: Theory and Applications, Academic Press, New York, pp. 129–163.<br />

Haehling von Lanzenauer, C. (1974). Optimal claim decisions by policyholders in automobile<br />

insurance with merit rating structures. Operations Research 22, 979–990.<br />

Hastings, N.A.J. (1976). Optimal claiming on vehicle insurance. Operational Research Quarterly 27,<br />

908–913.<br />

Henriet, D., & Rochet, J.C. (1986). La logique des systèmes bonus-malus en assurance automobile:<br />

une approche théorique. Annales d’Économie et de Statistique 1, 133–152.<br />

Heras, A., Gil, J.A., Garcia-Pineda, P., & Vilar, J.C. (2004). An application <strong>of</strong> linear programming<br />

to bonus-malus system design. ASTIN Bulletin 34, 435–456.<br />

Heras, A., Vilar, J.L., & Gil, J.A. (2002). Asymptotic fairness <strong>of</strong> bonus-malus systems and optimal<br />

scales <strong>of</strong> premiums. The Geneva Papers on <strong>Risk</strong> and Insurance - Theory 27, 61–82.<br />

Herzog, T.N. (1994). Introduction to <strong>Credibility</strong> Theory. ACTEX Publications.<br />

Hey, J.D. (1985). No claims bonus. The Geneva Papers on <strong>Risk</strong> and Insurance 10, 209–228.<br />

Hinde, J. (1982). Compound Poisson regression models. In: R. Gilchrist, Editor, GLIM 82: Proceedings<br />

<strong>of</strong> the International Conference on Generalised Linear Models, Springer, New York.<br />

Holtan, J. (1994). Bonus made easy. ASTIN Bulletin 24, 61–74.<br />

Holtan, J. (2001). Optimal loss financing under bonus-malus contracts. ASTIN Bulletin 31, 161–173.<br />

Hsiao, C. (2003). Analysis <strong>of</strong> Panel Data. Cambridge University Press, Cambridge.<br />

Huang, X., Song, L., & Liang, Y. (2003). Semiparametric credibility ratemaking using a piecewise<br />

linear prior. Insurance: Mathematics and Economics 33, 585–593.<br />

Hürlimann, W. (1990) On maximum likelihood estimation for count data models. Insurance:<br />

Mathematics and Economics 9, 39–49.<br />

Islam, M.N., & Consul, P.C. (1992). A probabilistic model for automobile claims. Bulletin <strong>of</strong> the<br />

Swiss Association <strong>of</strong> Actuaries 85–93.<br />

Jee, B. (1989). A comparative analysis <strong>of</strong> alternative pure premium models in the automobile risk<br />

classification system. Journal <strong>of</strong> <strong>Risk</strong> and Insurance 56, 434–459.<br />

Jewell, W.S. (1975). Model variations in credibility theory. In: P.M. Kahn, Editor, <strong>Credibility</strong>,<br />

Academic Press, New York, pp. 199–244.<br />

Johnson, N.L., Kotz, S., & Kemp, A.W. (1992). Univariate Discrete Distributions, John Wiley &<br />

Sons, Inc., New York.<br />

Jorgensen, B. (1987). Exponential dispersion models. Journal <strong>of</strong> the Royal Statistical Society, Series<br />

B 49, 127–162.<br />

Jorgensen, B. (1997). The Theory <strong>of</strong> Regression Models. Chapman & Hall, London.<br />

Jorgensen, B., & Paes de Souza, M.C. (1994). Fitting Tweedie’s compound Poisson model to<br />

insurance claims data. Scandinavian <strong>Actuarial</strong> Journal, 69–93.<br />

Karlis, D. (2005). EM algorithm for mixed Poisson and other discrete distributions. ASTIN Bulletin<br />

35, 3–24.<br />

Kelle, M. (2000). Modélisation du système de bonus malus français. Bulletin Français d’Actuariat 4,<br />

45–64.<br />

Kendall, M., & Stuart, A. (1977). Advanced Theory <strong>of</strong> Statistics, Vol. I. Griffin, London.<br />

Kestemont, R.-M., & Paris, J. (1985). Sur l’ajustement du nombre de sinistres. Bulletin <strong>of</strong> the Swiss<br />

Association <strong>of</strong> Actuaries, 157–163.<br />

Klugman, S. (1992). Bayesian Statistics in <strong>Actuarial</strong> Science. Kluwer, Boston.<br />

Klugman, S., Panjer, H., & Willmot, G. (2004). Loss Models: From Data to Decisions. John Wiley<br />

& Sons, Inc., New York.<br />

Kolderman, J., & Volgenant, A. (1985). Optimal claiming in an automobile insurance system with<br />

bonus-malus structure. Journal <strong>of</strong> the Operational Research Society 36, 239–247.


350 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing.<br />

Technometrics 34, 1–14.<br />

Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models. Journal <strong>of</strong> the Royal Statistical<br />

Society, Series B 58, 619–678.<br />

Lefèvre, Cl., & Picard, Ph. (1996). On the first crossing <strong>of</strong> a Poisson process in a lower boundary.<br />

In: C.C. Heyde, Yu V. Prohorov, R. Pyke and S.T. Rachev Editors, Athens Conference on Applied<br />

Probability and Time Series, Vol. I, Applied Probability. Lecture Notes in Statistics, Springer, Berlin,<br />

114, pp. 159–175.<br />

Lemaire, J. (1976). Driver versus company: Optimal behaviour <strong>of</strong> the policyholder. Scandinavian<br />

<strong>Actuarial</strong> Journal, 209–219.<br />

Lemaire, J. (1977). La soif du bonus. ASTIN Bulletin 9, 181–190.<br />

Lemaire, J. (1979). How to define a Bonus-Malus system with an exponential utility function. ASTIN<br />

Bulletin 10, 274–282.<br />

Lemaire, J. (1995). Bonus-Malus Systems in Automobile Insurance. Kluwer Academic Publisher,<br />

Boston.<br />

Lemaire, J., & Vandermeulen, E. (1983). Une propriété du principe de l’espérance mathématique.<br />

Bulletin Trimestriel de l’Institut des Actuaires Français, 5–14.<br />

Lemaire, J., & Zi, H. (1994a). High deductibles instead <strong>of</strong> Bonus-Malus. Can it work? ASTIN Bulletin<br />

24, 75–88.<br />

Lemaire, J., & Zi, H. (1994b). A comparative analysis <strong>of</strong> 30 Bonus-Malus systems. ASTIN Bulletin<br />

24, 287–309.<br />

Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models.<br />

Biometrika 73, 13–22.<br />

Lindsay, B. (1989a). On the determinants <strong>of</strong> moment matrices. Annals <strong>of</strong> Statistics 17, 711–721.<br />

Lindsay, B. (1989b). Moment matrices: applications in mixtures. Annals <strong>of</strong> Statistics 17, 722–740.<br />

Lindsay, B. (1995). Mixture Models: Theory, Geometry and Applications. Institute <strong>of</strong> Mathematical<br />

Statistics and the American Statistical Association.<br />

Lo, C.H., Fung, W.K., & Zhu, Z.Y. (2006). Generalized estimating equations for variance and<br />

covariance parameters in regression credibility models. Insurance: Mathematics and Economics 39,<br />

99–113.<br />

Loimaranta, K. (1972). Some asymptotic properties <strong>of</strong> bonus systems. ASTIN Bulletin 6, 223–245.<br />

Lord, D. (2006). Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects<br />

<strong>of</strong> low sample mean values and small sample size on the estimation <strong>of</strong> the fixed dispersion parameter.<br />

Accident Analysis and Prevention 38, 751–766.<br />

Luo, Y., Young, V.R., & Frees, E.W. (2004). <strong>Credibility</strong> ratemaking using collateral information.<br />

Scandinavian <strong>Actuarial</strong> Journal, 448–461.<br />

Makov, U.E. (2002). Principal applications <strong>of</strong> Bayesian methods in actuarial science: A perspective.<br />

North American <strong>Actuarial</strong> Journal 5, 53–73 (with discussion).<br />

Makov, U.E., Smith, A.F.M., & Liu, Y.H. (1996). Bayesian methods in actuarial science. The<br />

Statistician 45, 503–517.<br />

Martin-L<strong>of</strong>, A. (1973). A method for finding the optimal decision rule for a policy holder <strong>of</strong> an<br />

insurance with a bonus system. Scandinavian <strong>Actuarial</strong> Journal, 23–29.<br />

Mc Cullagh, P., & Nelder, J.A. (1989). Generalized Linear Models. Chapman & Hall, New York.<br />

Morillo, I., & Bermudez, L. (2003). Bonus-malus systems using an exponential loss function with<br />

an Inverse Gaussian distribution. Insurance: Mathematics and Economics 33, 49–57.<br />

Mowbray, A.H. (1914). How extensive a payroll exposure is necessary to give a dependable pure<br />

premium. Proceedings <strong>of</strong> the Casualty <strong>Actuarial</strong> Society 1, 24–30.<br />

Nelder, J.A., & Verral, R.J. (1997). <strong>Credibility</strong> theory and generalized linear models. ASTIN Bulletin<br />

27, 71–82.<br />

Nelder, J.A., & Wedderburn, R.W.M. (1972). Generalized linear models. Journal <strong>of</strong> the Royal<br />

Statistical Society, Series A 135, 370–384.<br />

Neuhaus, W. (1988). A bonus-malus system in automobile insurance. Insurance: Mathematics and<br />

Economics 7, 103–112.<br />

Norberg, R. (1975). <strong>Credibility</strong> premium plans which make allowance for bonus hunger. Scandinavian<br />

<strong>Actuarial</strong> Journal, 73–86.


Bibliography 351<br />

Norberg, R. (1976). A credibility theory for automobile bonus system. Scandinavian <strong>Actuarial</strong> Journal,<br />

92–107.<br />

Norberg, R. (1980). Empirical Bayes credibility. Scandinavian <strong>Actuarial</strong> Journal, 177–194.<br />

Norberg, R. (1986). Hierarchical credibility: Analysis <strong>of</strong> a random effect linear model with nested<br />

classification. Scandinavian <strong>Actuarial</strong> Journal, 204–222.<br />

Norberg, R. (2004). <strong>Credibility</strong> Theory. In: J. Teugels and B. Sundt Editors, Encyclopedia <strong>of</strong> <strong>Actuarial</strong><br />

Science, John Wiley & Sons, Ltd, pp. 398–406.<br />

Norman, J.M., & Shearn, D.C.S. (1980). Optimal claiming on vehicle insurance revisited. Journal<br />

<strong>of</strong> the Operational Research Society 31, 181–186.<br />

Panjer, H.H. (1981). Recursive evaluation <strong>of</strong> a family <strong>of</strong> compound distributions. ASTIN Bulletin 12,<br />

22–26.<br />

Panjer, H.H. (1987). Models <strong>of</strong> claim frequency. In: I.B. Mac Neill and G.J. Umphrey Editors,<br />

Advances in the Statistical Sciences, Vol. VI, <strong>Actuarial</strong> Sciences, The University <strong>of</strong> Western Ontario<br />

Series in Philosophy <strong>of</strong> Science, Reidel, Dordrecht 39, pp. 115–122.<br />

Philipson, C. (1960). The Swedish system <strong>of</strong> bonus. ASTIN Bulletin 1, 134–141.<br />

Picard, P. (1976). Généralisation de l’étude sur la survenance des sinistres en assurance automobile.<br />

Bulletin Trimestriel de l’Institut des Actuaires Français, 204–267.<br />

Pinquet, J. (1997). Allowance for cost <strong>of</strong> claims in Bonus-Malus systems. ASTIN Bulletin 27, 33–57.<br />

Pinquet, J. (1998). Designing optimal bonus-malus systems from different types <strong>of</strong> claims, ASTIN<br />

Bulletin, 205–220.<br />

Pinquet, J. (2000). Experience rating through heterogeneous models. In: G. Dionne, Editor, Handbook<br />

<strong>of</strong> Insurance, Kluwer Academic Publishers.<br />

Pinquet, J., Guillén, M., & Bolancé, C. (2001). Allowance for the age <strong>of</strong> claims in bonus-malus<br />

systems. ASTIN Bulletin 31, 337–348.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2003a). Fitting the Belgian Bonus-Malus system. Belgian<br />

<strong>Actuarial</strong> Bulletin 3, 58–62.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2003b). Setting a bonus-malus scale in the presence <strong>of</strong><br />

other rating factors: Taylor’s work revisited. ASTIN Bulletin 33, 419–436.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2004). Bonus-malus scales in segmented tariffs: Gilde<br />

& Sundt’s work revisited. Australian <strong>Actuarial</strong> Journal 10, 107–125.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2005). Bonus-malus systems with varying deductibles.<br />

ASTIN Bulletin 35, 261–274.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2006a). Multi-event bonus-malus scales. Journal <strong>of</strong> <strong>Risk</strong><br />

and Insurance 73, 517–528.<br />

Pitrebois, S., Denuit, M., & Walhin, J.-F. (2006b). An actuarial analysis <strong>of</strong> the French bonus-malus<br />

system. Scandinavian <strong>Actuarial</strong> Journal, 247–264.<br />

Pitrebois, S., Walhin, J.-F., & Denuit, M. (2006c). How to transfer policyholders from one bonusmalus<br />

scale to the other? German <strong>Actuarial</strong> Bulletin 27, 607–618.<br />

Pohlmeier, W., & Ulrich, V. (1995). An econometric model <strong>of</strong> the two-part decision making process<br />

in the demand for health care. Journal <strong>of</strong> Human Resources 30, 339–361.<br />

Purcaru, O., & Denuit, M. (2003). Dependence in dynamic claim frequency credibility models.<br />

ASTIN Bulletin 33, 23–40.<br />

Qian, W. (2000). An application <strong>of</strong> nonparametric regression estimation in credibility theory.<br />

Insurance: Mathematics and Economics 27, 169–176.<br />

Quigley, J., Bedford, T., & Walls, L. (2006). Estimating rate <strong>of</strong> occurrence <strong>of</strong> rare events with<br />

empirical bayes: A railway application. Reliability Engineering and System Safety, in press.<br />

Raftery, A.E. (1995). Bayesian model selection in social research. In: P. Marsden, Editor, Sociological<br />

Methodology, The American Sociological Association, Washington, pp. 111–163.<br />

Rejesus, R.M., Coble, K.H., Knight, T.O., & Jin, Y. (2006). Developing experience-based premium<br />

rate discounts in crop insurance. American Journal <strong>of</strong> Agricultural Economics 88, 409–419.<br />

Retzlaff-Roberts, D., & Puelz, R. (1996). <strong>Classification</strong> in automobile insurance using a DEA and<br />

discriminant analysis hybrid. Journal <strong>of</strong> Productivity Analysis 7, 417–427.<br />

Roberts, A.W., & Varberg, D.E. (1973). Convex Functions. Academic Press, New York.<br />

Rolski, T., Schmidli, H., Schmidt, V., & Teugels, J. (1999). Stochastic Processes for Insurance<br />

and Finance. John Wiley & Sons, Inc., New York.


352 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Ruohonen, M. (1987). On a model for the claim number process. ASTIN Bulletin 18, 57–68.<br />

Santos Silva, J.M.C., & Windmeijer, F. (2001). Two-part multiple spell models for health care<br />

demand. Journal <strong>of</strong> Econometrics 104, 67–89.<br />

Sarabia, J.M., Gomez-Deniz, E., & Vazquez-Polo, F.J. (2004). On the use <strong>of</strong> conditional<br />

specification models in claim count distributions: An application to bonus-malus systems. ASTIN<br />

Bulletin 34, 85–98.<br />

Shaked, M. (1980). On mixtures from exponential families. Journal <strong>of</strong> the Royal Statistical Society,<br />

Series B 42, 192–198.<br />

Sharif, A.H., & Panjer, H.H. (1993). A probabilistic model for automobile claims: a comment<br />

on the article by M.N. Islam and P.C. Consul. Bulletin <strong>of</strong> the Swiss Association <strong>of</strong> Actuaries,<br />

279–282.<br />

Shengwang, M., Wei, Y., & Whitmore, G.A. (1999). Accounting for individual over-dispersion in<br />

a Bonus-Malus automobile insurance system. ASTIN Bulletin 29, 327–337.<br />

Simar, L. (1976). Maximum likelihood estimation <strong>of</strong> a compound Poisson process. The Annals <strong>of</strong><br />

Statistics 4, 1200–1209.<br />

Smyth, G.J., & Jorgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims<br />

data: Dispersion modelling. ASTIN Bulletin 32, 143–157.<br />

Subramanian, K. (1998). Bonus-malus systems in a competitive environment. North American<br />

<strong>Actuarial</strong> Journal 2, 38–44.<br />

Sundt, B. (1981). Recursive credibility estimation. Scandinavian <strong>Actuarial</strong> Journal, 3–21.<br />

Sundt, B. (1983). Parameter estimation in some credibility models. Scandinavian <strong>Actuarial</strong> Journal,<br />

239–255.<br />

Sundt, B. (1988). <strong>Credibility</strong> estimators with geometric weights. Insurance: Mathematics and<br />

Economics 7, 113–122.<br />

Sundt, B. (1999). On multivariate Panjer recursions. ASTIN Bulletin 29, 29–45.<br />

Taylor, G.C. (1989). Use <strong>of</strong> spline functions for premium rating by geographic area. ASTIN bulletin<br />

19, 91–122.<br />

Taylor, G.C. (1996). Geographic premium rating by Whittaker spatial smoothing. ASTIN Bulletin 31,<br />

147–160.<br />

Taylor, G. (1997). Setting a bonus-malus scale in the presence <strong>of</strong> other rating factors. ASTIN Bulletin<br />

27, 319–327.<br />

ter Berg, P. (1980a). Two pragmatic approaches to loglinear claim cost analysis. ASTIN Bulletin 11,<br />

77–90<br />

ter Berg, P. (1980b). On the loglinear Poisson and gamma model. ASTIN Bulletin 11, 35–40.<br />

ter Berg, P. (1996). A loglinear lagrangian Poisson model. ASTIN Bulletin 26, 123–129.<br />

Titterington, D.M., Smith, A.F.M., & Makov, U.E. (1985). Statistical Analysis <strong>of</strong> Finite Mixture<br />

Distributions. John Wiley & Sons Ltd, Chichester.<br />

Tremblay, L. (1992). Using the Poisson-inverse gaussian distribution in bonus-malus systems. ASTIN<br />

Bulletin 22, 97–106.<br />

Tucker, H.G. (1963). An estimate <strong>of</strong> the compounding distribution <strong>of</strong> a compound Poisson distribution.<br />

Theory <strong>of</strong> Probability and Its Applications 8, 195–200.<br />

Tweedie, M.C.K. (1984). An index which distinguishes between some important exponential families.<br />

In: J.K. Ghosh and J. Roy, Editors, Statistics: Applications and New Directions, Proceedings <strong>of</strong> the<br />

Indian Statistical Institute Golden Jubilee International Conference, pp. 579–604.<br />

Vandebroek, M. (1993). Bonus-malus system or partial coverage to oppose moral hazard problems?<br />

Insurance: Mathematics and Economics 13, 1–5.<br />

Viswanathan, K.S., & Lemaire, J. (2005). Bonus-malus systems in a deregulated environment:<br />

Forecasting market shares using diffusion models. ASTIN Bulletin 35, 299–319.<br />

Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses.<br />

Econometrica 57, 307–333.<br />

Walhin, J.-F. (2001). Some comments on the individual risk model and multivariate extension.<br />

German <strong>Actuarial</strong> Bulletin 25, 257–270.<br />

Walhin, J.-F., & Paris, J. (1999). Using mixed Poisson distributions in connection with Bonus-Malus<br />

systems. ASTIN Bulletin 29, 81–99.


Bibliography 353<br />

Walhin, J.-F., & Paris, J. (2000). The true claim amount and frequency distributions within a bonusmalus<br />

system. ASTIN Bulletin 30, 391–403.<br />

Walhin, J.-F., & Paris, J. (2001). The practical replacement <strong>of</strong> a bonus-malus system. ASTIN Bulletin<br />

31, 317–335.<br />

Wedel, M., Böckenholtb, U., & Kamakurac, W.A. (2003). Factor models for multivariate count<br />

data. Journal <strong>of</strong> Multivariate Analysis 87, 356–369.<br />

Weisberg, H.I., Tomberlin, T.J., & Chatterjee, S. (1984). Predicting insurance losses under crossclassification:<br />

A comparison <strong>of</strong> alternative approaches. Journal <strong>of</strong> Business & Economic Statistics<br />

2, 170–178.<br />

Whitney, W. (1918). The theory <strong>of</strong> experience rating. Proceedings <strong>of</strong> the Casualty <strong>Actuarial</strong> Society<br />

4, 274–292.<br />

Williams, G.J., & Huang, Z. (1996). KDD for insurance risk assessment: A case study. CSIRO<br />

Mathematical and Information Sciences.<br />

Willmot, G.E. (1987). The Poisson-inverse gaussian distribution as an alternative to the negative<br />

binomial. Scandinavian <strong>Actuarial</strong> Journal, 113–127.<br />

Winkelmann, R. (2003). Econometric Analysis <strong>of</strong> Count Data. Springer, New York.<br />

Yeo, A.C., Smith, K.A., Willis, R.J., & Brooks, M. (2001). Clustering technique for risk classification<br />

and prediction <strong>of</strong> claim costs in the automobile insurance industry. International Journal <strong>of</strong> Intelligent<br />

Systems in Accounting, Finance and Management 10, 39–50.<br />

Yeo, K.L., & Valdez, E.A. (2006). <strong>Claim</strong> dependence with common effects in credibility models.<br />

Insurance: Mathematics and Economics 38, 609–629.<br />

Yip, K.C.H., & Yau, K.K.W. (2005). On modeling claim frequency data in general insurance with<br />

extra zeros. Insurance: Mathematics and Economics 36, 153–163.<br />

Young, V.R. (1997). <strong>Credibility</strong> using semiparametric models. ASTIN Bulletin 27, 273–285.<br />

Young, V.R. (1998a). <strong>Credibility</strong> using a loss function from spline theory: parametric models with a<br />

one-dimensional sufficient statistic. North American <strong>Actuarial</strong> Journal 2, 101–117 (with discussion).<br />

Young, V.R. (1998b). Robust Bayesian credibility using semiparametric models. ASTIN Bulletin 28,<br />

187–203.<br />

Young, V.R. (2000). <strong>Credibility</strong> using semiparametric models and a loss function with a constancy<br />

penalty. Insurance: Mathematics and Economics 26, 151–156.<br />

Young, V.R., & De Vylder, F.E. (2000). <strong>Credibility</strong> in favor <strong>of</strong> unlucky insureds. North American<br />

<strong>Actuarial</strong> Journal 4, 107–113.


Index<br />

Binomial distribution 13–15, 271–2, 281,<br />

333<br />

Bonus hunger 51–2, 220–1, 246–54, 257–8<br />

Chapman-Kolmogorov equations 174<br />

De Pril algorithm 330–3, 334, 341<br />

Deductible 168–9, 251, 276–91<br />

Deviance 71, 72, 75<br />

Deviance residuals 72, 77<br />

Efficiency 220, 257, 307, 314, 318<br />

De Pril 242–6<br />

Loimaranta 240–3<br />

Eigenvalue 175, 184, 295<br />

Eigenvector 175, 179<br />

Ergodicity 176, 273<br />

Exposure-to-risk 20, 44–5, 53, 56, 73, 93<br />

Financial equilibrium 123, 130, 135, 167, 186,<br />

197, 202, 205, 305, 314, 318, 333–5, 340–1,<br />

342<br />

Fisher<br />

information matrix 38, 40, 69, 73, 104<br />

score 36, 67–8<br />

Gamma<br />

distribution 28, 29, 230<br />

regression 231–2<br />

GEE 101–5<br />

Generalized Pareto<br />

distribution 224–9<br />

index 224–6<br />

Gertensgarbe plot 225–6<br />

Information criteria<br />

AIC 43, 110<br />

BIC 43, 110<br />

Interaction 58–9, 62–3, 74<br />

Inverse Gaussian<br />

distribution 31–2, 233<br />

regression 233–4<br />

Kolmogorov distance 208–13<br />

Large claim 223–30, 256–7<br />

Lemaire algorithm 251–4<br />

Likelihood ratio<br />

confidence interval 69<br />

test 41–2, 45, 72, 76, 82<br />

LogNormal<br />

distribution 34<br />

regression 235–6, 246–9<br />

Mixed Poisson<br />

credibility model 126–8, 155–8<br />

discrete credibility model 135–6, 193–4<br />

distribution 24–8, 46, 270–2, 278, 326<br />

process 25<br />

regression 80–1, 260–3<br />

<strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong>: <strong>Risk</strong> <strong>Classification</strong>, <strong>Credibility</strong> and Bonus-Malus Systems<br />

S. Pitrebois and J.-F. Walhin © 2007 John Wiley & Sons, Ltd<br />

M. Denuit, X. Maréchal,


356 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

Negative Binomial<br />

credibility model 131–5, 144, 148, 151–2<br />

distribution 28–31, 45–6, 281, 284, 336<br />

regression 83–6, 105–6, 238<br />

Offset 73, 101<br />

Optimal retention 220–1, 247–9, 251–4, 257–8<br />

Panjer algorithm 279–84, 287, 330–2<br />

Perron-Froebenius theorem 175<br />

Poisson<br />

distribution 15–17, 45–6, 54–6, 172–3,<br />

229–30, 281, 283<br />

process 17–21<br />

regression 64–79, 94–101, 229–30, 237–8<br />

Poisson-Inverse Gaussian<br />

distribution 32–3, 45–6<br />

regression 86–7, 106–7<br />

Poisson-LogNormal<br />

distribution 34, 45–6<br />

regression 87–8, 107–8<br />

Regularity 175, 176, 179, 273<br />

Score test 43, 82–3<br />

Special bonus rule 200–8<br />

Specification error 72–3, 80<br />

Total variation distance 183–4, 208–13, 295–6,<br />

308, 315<br />

Vuong test 43–4, 46, 89, 109–10<br />

Wald<br />

confidence interval 69<br />

test 42–3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!