The Kirkpatrick Model’s Four Levels of Training Evaluation provides an indispensable framework for measuring the effectiveness of training and learning initiatives — developed by Donald L. Kirkpatrick in the 1950s, it enables organizations to trace learning impact from immediate reactions all the way to organizational results.
Why Plan for Impact Measurement?
Evaluation should be designed before training launches, not bolted on after. Three reasons this matters:
- Alignment with organizational goals: Training programs should align closely with organizational goals and strategy
- Resource allocation: Plan impact measurement in advance to allocate resources effectively
- Stakeholder expectations: Pre-planning allows defining success criteria that satisfy all parties
- What specific improvements are expected as a result of this training?
- What key performance indicators (KPIs) are relevant to this project?
- What does success look like from your perspective?
The Four Levels of Training Evaluation
Reaction
Measuring participants' initial reactions and engagement with the training experience. This is the most commonly collected data — and the least predictive of actual impact on its own.
- Alignment of pre-training expectations with actual outcomes
- Completion rate
- Participant feedback through surveys or feedback forms
- Post-training surveys assessing satisfaction and perceived value
- User ratings and reviews
- Time spent on training
Learning
Assessing actual learning outcomes — the extent to which participants have acquired new knowledge, skills, and attitudes as a result of the training.
- Pre- and post-assessment scores measuring knowledge and skills improvement
- Skill assessments for specific competencies
- Knowledge retention rates
- Certification rates
- Competency development progress
- On-the-job performance improvements
Behavior
Assessing the practical application of learning within the workplace — behavioral changes and their impact on job performance. This is where training value is truly tested.
- On-the-job performance metrics: error rates, productivity, quality
- Compliance with newly introduced protocols
- Employee turnover rate
- Internal promotions as a result of improved skills
- Leadership effectiveness
- Team collaboration quality
- Customer feedback from interactions with trained employees
Results
Measuring the tangible impact of training on organizational goals and objectives. Level 4 data is the hardest to collect — and the most compelling for stakeholders.
- Revenue growth
- Profit margin improvements
- Market share increase
- Customer satisfaction scores
- Net Promoter Score (NPS)
- Customer retention rate
- Employee turnover cost savings
- Cost reductions attributable to training
Measuring Success and Impact
A consistent measurement approach across all four levels requires both process and discipline:
- Baseline data: Collect data before implementing the training program as a reference point
- Regular tracking: Continuously monitor selected metrics throughout the program
- Feedback loops: Encourage feedback from participants and stakeholders
- Qualitative insights: Use participant testimonials, case studies, and anecdotes to complement quantitative data
- Data analysis: Identify trends, correlations, and outliers within collected metrics
- Benchmarking: Compare against industry standards or competitors
- Iterate and improve: Make adjustments based on data insights for future programs
You don't need to measure all four levels for every training program. Prioritize Level 3 and Level 4 data for high-stakes initiatives where business impact is critical, and Level 1 and Level 2 for compliance or awareness programs where reaction and knowledge gain are the primary goals.
Key Questions Answered
The most commonly asked questions about this topic, concisely answered.
- Kirkpatrick's Four Levels is an evaluation framework developed by Donald Kirkpatrick in the 1950s that measures training effectiveness at four progressive levels: Reaction (did participants find it valuable?), Learning (did they gain knowledge or skills?), Behavior (did they apply it on the job?), and Results (did it impact organizational goals?).
- Planning evaluation upfront ensures training is aligned to measurable outcomes from the start. It forces agreement with stakeholders on what success looks like before any content is built, prevents retrofitting metrics that don't match the training, and allocates resources for data collection while baselines can still be established.
- Level 1 (Reaction) measures how participants felt about the training — satisfaction, perceived relevance, and engagement. Level 2 (Learning) measures what they actually gained — knowledge, skills, or attitude changes. A positive reaction does not guarantee learning occurred; both levels must be measured separately.
- Level 3 measures on-the-job application of training, typically 30–90 days post-training. Methods include manager observations, performance metrics (error rates, productivity), 360-degree feedback, compliance auditing, and structured interviews. Level 3 data requires collaboration between L&D and line managers to collect effectively.
- Level 4 (Results) is the hardest to measure because it requires isolating the training's contribution to business outcomes — revenue, customer satisfaction, retention — from all other variables affecting those metrics. It demands baseline data collected before training, longitudinal tracking, and often statistical controls that many organizations lack the resources to implement.
- No. Prioritize Level 3 and Level 4 for high-stakes programs where business impact is critical — leadership development, safety training, sales performance. Level 1 and Level 2 are sufficient for compliance or awareness programs where the goal is knowledge gain rather than behavioral change. Measuring all four levels for every program is rarely practical.
- Relying exclusively on Level 1 smile sheets and equating participant satisfaction with training effectiveness. Positive reactions are useful but say nothing about learning or behavioral transfer. Another common mistake is collecting Level 2 assessment data immediately after training without following up to check whether behavior actually changed weeks later.
- The Phillips ROI Methodology extends Kirkpatrick's four levels by adding a fifth level — Return on Investment — that converts Level 4 results into a monetary value and compares it to the cost of training. Kirkpatrick's model identifies what changed; Phillips quantifies whether the change was worth the investment.
- Both frameworks share the principle of starting with the end in mind. In Backward Design, you define desired outcomes before designing instruction. In Kirkpatrick's model, you define what Level 3 and Level 4 success looks like before designing Level 1 and Level 2 measurement — and before designing the training itself. Together, they ensure training is both outcome-aligned and evaluable.
- Collect data that directly corresponds to your Level 3 and Level 4 goals: current performance metrics (error rates, sales figures, compliance scores), pre-assessment scores for Level 2 measurement, and any organizational KPIs the training aims to improve. Without a baseline, post-training data has no reference point and cannot demonstrate impact.
- AI and learning analytics can automate and deepen evaluation at every level: Level 1 — sentiment analysis of open-text feedback; Level 2 — adaptive assessments that measure knowledge retention over time; Level 3 — automated tracking of on-the-job behavior changes through system logs; Level 4 — correlating training participation with business KPIs in dashboards. AI reduces the manual effort that traditionally made Level 3 and 4 evaluation impractical.
- The model has evolved since the 1950s — most notably through the New World Kirkpatrick Model developed by Jim and Wendy Kirkpatrick, which reverses the evaluation order (starting from Level 4 results and working backward) and adds leading indicators. The core principle — evaluating training at multiple levels beyond learner satisfaction — remains as relevant as ever, even as newer frameworks like Phillips ROI extend it further.