"> AIC (Akaike Information Criterion): A Complete Guide
Home > Library > Statistics > AIC (Akaike Information Criterion): Formula, Examples & Uses

Published by at September 2nd, 2021 , Revised On June 16, 2026

The Akaike Information Criterion (AIC) is a single number that estimates how well a statistical model fits a dataset relative to other candidate models, while penalising unnecessary complexity. It is calculated as AIC = 2k − 2ln(L), where k is the number of estimated parameters and L is the maximised value of the model’s likelihood. The model with the lowest AIC is preferred, because it offers the best trade-off between goodness of fit and parsimony.

AIC was introduced by the Japanese statistician Hirotugu Akaike in 1973 and formalised in his landmark 1974 paper. It is now one of the most widely used tools for model selection across statistics, econometrics, ecology and machine learning. This guide explains exactly what AIC is, the formula and the logic behind it, a fully worked example, how AIC compares with BIC, the small-sample correction AICc, and the drawbacks you should keep in mind.

  • What exactly is the Akaike Information Criterion, and what does its formula mean?
  • When should researchers use AIC, and how do you interpret the results?
  • How does AIC differ from BIC, and when should you use AICc instead?
  • What are the main drawbacks of AIC?

“The proposed minimum AIC procedure provides a versatile procedure for statistical model identification.” — Hirotugu Akaike, “A New Look at the Statistical Model Identification,” IEEE Transactions on Automatic Control, 1974

What is the Akaike Information Criterion?

The Akaike Information Criterion (AIC) is a measure of the relative quality of statistical models for a given set of data. It estimates the amount of information lost when a particular model is used to represent the process that generated the data. The less information a model loses, the higher its quality — and the lower its AIC score.

Crucially, AIC is a relative measure. It does not tell you whether a model is good in any absolute sense; it only ranks models against one another for the same dataset. An AIC value on its own is meaningless — a score of 245 tells you nothing until you compare it with the AIC of a rival model fitted to the identical data. As a rule, the lower the AIC, the better the model.

AIC rewards models that fit the data well but penalises models that use too many parameters. This is the same idea that underpins regularisation in machine learning: a complex model can always be made to fit the training data more closely, but it risks overfitting and generalising poorly to new data. By adding a penalty for each extra parameter, AIC discourages needlessly complicated models and helps you select one that should perform well on out-of-sample data.

The AIC formula

The Akaike Information Criterion is defined as:

AIC = 2k − 2ln(L)

where:

  • k = the number of estimated parameters in the model (including the intercept and the error variance, where applicable);
  • L = the maximised value of the likelihood function for the model — essentially, how probable the observed data are under the fitted model;
  • ln(L) = the natural logarithm of that maximised likelihood (the maximised log-likelihood).

The term 2k is the complexity penalty: it grows as you add parameters. The term −2ln(L) rewards goodness of fit: it shrinks as the model becomes more likely to have produced the data. AIC adds the two together, so the best model is the one that achieves a high likelihood without paying too much in extra parameters. Because the likelihood penalty and the complexity penalty pull in opposite directions, minimising AIC strikes a balance between fit and parsimony.

If you already have the log-likelihood of your model, calculating AIC by hand is straightforward arithmetic. The hard part is obtaining the log-likelihood itself — fortunately, virtually every statistical package (R, Python’s statsmodels, SPSS, Stata) reports AIC automatically when you fit a model.

AIC Worked Example: Comparing Two Models

The quickest way to understand AIC is to compute it. Suppose you are modelling exam scores and you fit two competing regression models to the same dataset. Your statistical software reports the maximised log-likelihood, ln(L), for each.

Example: You compare two models fitted to the same data.

Model A — a simple model with k = 3 parameters and log-likelihood ln(L) = −90.
Model B — a richer model with k = 5 parameters and log-likelihood ln(L) = −87.

Apply AIC = 2k − 2ln(L) to each:

Model A: AIC = 2(3) − 2(−90) = 6 + 180 = 186
Model B: AIC = 2(5) − 2(−87) = 10 + 174 = 184

Model B has the lower AIC (184 vs 186), so it is preferred. Although Model B uses two extra parameters, its better fit (a higher log-likelihood of −87 rather than −90) more than offsets the complexity penalty. The difference in AIC is ΔAIC = 186 − 184 = 2.

Interpreting the difference (ΔAIC). Only differences between AIC values matter, not the absolute numbers. A common rule of thumb, due to Burnham and Anderson, is:

  • ΔAIC of 0–2: the models have substantial support — little to choose between them;
  • ΔAIC of 4–7: considerably less support for the higher-AIC model;
  • ΔAIC > 10: essentially no support for the higher-AIC model.

In our example ΔAIC = 2, so while Model B wins, the evidence is not decisive — both models remain plausible and you might favour the simpler Model A on grounds of parsimony.

The table below summarises the comparison.

Model Parameters (k) Log-likelihood ln(L) AIC = 2k − 2ln(L) ΔAIC
Model A (simple) 3 −90 186 2
Model B (richer) 5 −87 184 (best) 0

AIC balances two things

Goodness of fit

  • Lower −2ln(L) = better fit
Model complexity

  • A 2k penalty for extra parameters

Struggling to choose the right model for your data?

ResearchProspect to the rescue!

Our experts can run model selection, AIC comparison and full statistical analysis for your dissertation — see our statistical analysis service.

When to Use the AIC Model

AIC is one of the most widely used model-selection methods in statistics. You determine the best fit for your data by calculating and comparing the AIC scores of several candidate models, then choosing the one with the lowest score.

AIC is especially valuable when conventional machine-learning practice — splitting data into training, validation and test sets — is impractical. With small samples or time-series data, holding back a large test set is wasteful or impossible, because the most informative observations (often the most recent ones) would be locked away in the validation and test sets. Training on all the data and using AIC to penalise complexity can therefore yield better model selection than a conventional train/validation/test split in these settings.

When testing a hypothesis, you may collect data on several factors whose influence you are unsure about, particularly when exploring a new idea. You then want to know which of your independent variables best account for the variation in your dependent variable. Building a set of models, each containing a different combination of the predictors you have measured, is an effective way to find out.

The combinations you test should be guided by the following:

  1. Your understanding of the research system — do not include predictors that are not logically connected to the outcome. With enough variables you can manufacture spurious correlations between almost anything.
  2. The experimental design — if, for example, you have applied two treatments to separate groups of subjects, there is usually no reason to test for an interaction between them.

Once you have a handful of candidate models, you compare them using AIC. Models with the lowest AIC are favoured, and AIC penalises models that carry more parameters. When two models explain the same amount of variation, the one with fewer parameters has the lower AIC and is judged the better fit — a direct application of the principle of parsimony. AIC is commonly used to choose between competing regression specifications, to select the order of time-series models such as ARIMA, and to compare nested and non-nested models in inferential statistics.

AIC vs BIC vs AICc

AIC is closely related to two other information criteria you will encounter: the Bayesian Information Criterion (BIC) and the small-sample-corrected AICc. They share the same structure — a goodness-of-fit term plus a complexity penalty — but differ in how harshly they penalise extra parameters.

BIC = k·ln(n) − 2ln(L), where n is the sample size. The crucial difference is the penalty: AIC charges 2 per parameter, whereas BIC charges ln(n) per parameter. Because ln(n) exceeds 2 once the sample size passes about 8 observations, BIC penalises complexity more heavily than AIC and therefore tends to select simpler models, especially with large samples. The two criteria also have different goals: AIC aims to minimise prediction error and find the model that best approximates reality, while BIC assumes a “true” model exists among the candidates and tries to identify it (it is consistent, selecting the true model with probability approaching 1 as n grows).

AICc is a correction to AIC for small samples:

AICc = AIC + (2k² + 2k) / (n − k − 1)

When the sample size n is small relative to the number of parameters k, ordinary AIC tends to under-penalise complexity and over-fit. AICc adds an extra penalty term that grows as k approaches n. As n becomes large the correction term shrinks towards zero and AICc converges to AIC. A common recommendation (Burnham and Anderson) is to use AICc rather than AIC whenever n/k is less than about 40.

Criterion Formula Penalty per parameter Best used when…
AIC 2k − 2ln(L) 2 Goal is prediction; moderate to large samples
AICc AIC + (2k² + 2k)/(n − k − 1) > 2 (grows as k→n) Small samples (n/k < ~40)
BIC k·ln(n) − 2ln(L) ln(n) Goal is to identify the “true” model; large samples

For all three criteria the rule is the same: compute the score for every candidate model fitted to the same data, and choose the model with the lowest value.

Drawbacks of the Akaike Information Criterion

AIC is powerful but not without limitations:

  • It is only relative. AIC assesses models against one another, not against an absolute standard. Every model in your candidate set could be poor, and AIC would still hand you a “best” one. Always check absolute fit with other diagnostics (residual plots, R², predictive checks).
  • The candidate set matters. AIC can only choose among the models you supply. If the genuinely best model is not in your set, AIC cannot find it.
  • It can over-fit in small samples. With little data, plain AIC tends to favour overly complex models — which is exactly why AICc exists.
  • Same data, same response, comparable likelihoods. AIC values are only comparable when models are fitted to the identical dataset and the same response variable, using a consistently defined likelihood.
  • Newer methods may be more accurate. AIC has been refined and, in some settings, surpassed by more computationally demanding measures such as WAIC (Watanabe–Akaike Information Criterion), DIC (Deviance Information Criterion) and LOO-CV (Leave-One-Out Cross-Validation), with which AIC is asymptotically equivalent for large samples.

Which criterion you choose depends on the trade-off between accuracy and computational effort, and on what your software supports.

To Conclude

When you have plenty of data, the simplest and most reliable way to assess a model’s performance is the standard machine-learning approach of a train, validation and test split. But when that is impractical — with small samples or time-series data — the Akaike Information Criterion is an excellent alternative. Remember the essentials: AIC = 2k − 2ln(L), lower is better, only differences between models are meaningful, and switch to AICc when your sample is small. Understanding these few rules lets you use one of statistics’ most enduring tools with confidence.

Frequently Asked Questions

What is the Akaike Information Criterion (AIC)?

AIC is a measure of the relative quality of statistical models for a given dataset. It estimates how much information a model loses, balancing goodness of fit against complexity. It is calculated as AIC = 2k − 2ln(L), where k is the number of parameters and L is the maximised likelihood. The model with the lowest AIC is preferred.

The formula is AIC = 2k − 2ln(L). Here, k is the number of estimated parameters in the model and ln(L) is the maximised log-likelihood. The 2k term penalises complexity, while −2ln(L) rewards a good fit to the data.

A lower AIC is better. A smaller AIC indicates a model that loses less information and strikes a better balance between fit and parsimony. Because AIC is relative, only the differences between models fitted to the same data are meaningful; a difference (ΔAIC) of less than 2 means the models are roughly equivalent.

Both balance fit against complexity, but they penalise parameters differently. AIC charges 2 per parameter (2k − 2ln(L)), while BIC charges ln(n) per parameter (k·ln(n) − 2ln(L)), where n is the sample size. BIC penalises complexity more heavily for samples larger than about 8 observations, so it tends to select simpler models. AIC targets prediction accuracy; BIC tries to identify the true model.

Use AICc, the small-sample correction, when your sample size is small relative to the number of parameters — a common guideline is when n/k is below about 40. AICc adds the term (2k² + 2k)/(n − k − 1) to AIC, which guards against over-fitting. As the sample grows, AICc converges to ordinary AIC.

AIC was developed by the Japanese statistician Hirotugu Akaike. He introduced the idea in 1973 and presented it formally in his 1974 paper “A New Look at the Statistical Model Identification.” The criterion is named in his honour.

About Owen Ingram

Avatar for Owen IngramIngram is a dissertation specialist. He has a master's degree in data sciences. His research work aims to compare the various types of research methods used among academicians and researchers.

WhatsApp Live Chat