KIP 286: Permissionless Validator Lifecycle

Author	Lewis, Ian, Ollie, Lake, and Joseph
Discussions-To	https://github.com/kaiachain/kips/issues/86
Status	Draft
Type	Core
Created	2026-01-19
Requires	227

Abstract

This KIP introduces an automatic state transition framework for candidates and validators in the permissionless Kaia network. It defines 9 states (4 candidate states, 5 validator states) and their allowed transitions, enabling decentralized validator lifecycle management without manual intervention from the Kaia team.

Motivation

The current Kaia network operates in a permissioned manner where the Kaia team manually manages validator states. As the network transitions to permissionless operation, an automatic state management system becomes necessary.

With KIP-227 introducing VRank for quantitative evaluation of validators, this KIP establishes the state machine that governs:

How candidates join the network through a VRank testing
How validators participate in consensus based on stake ranking
How validators are paused or removed upon VRank violations
How validators can voluntarily exit or perform maintenance

This framework enables trustless validator lifecycle management while maintaining network stability through structured state transitions.

Specification

Parameters

The following parameters are used in this KIP:

Parameter	Description	Sample Value
`MaxValidatorCount`	Maximum number of validators across all validator states (ValActive, ValReady, ValInactive, ValPaused, ValExiting)	100
`ActiveValidatorCount`	Number of top-staked validators that is in `ValActive`, `ValPaused` when epoch starts	50
`MaxReadyCandidateCount`	Maximum number of candidates that can be in `CandReady`	3
`MinStake`	Minimum staking amount required to enter the consensus	5,000,000 KAIA
`ValPausedTimeout`	Maximum duration a validator can remain in `ValPaused` before transitioning to `ValInactive`	8 hours
`ValIdleTimeout`	Maximum duration a validator can remain in `ValInactive` or `ValReady` before transitioning to `CandInactive`	30 days
`ValPausedSlotLimit`	Maximum number of validators that can be in `ValPaused`	1*`F` / 2, where `F` is the byzantine fault tolerance factor
`ValExitingSlotLimit`	Maximum number of validators that can be in `ValExiting`	1*`F` / 2, where `F` is the byzantine fault tolerance factor
`VRankEpoch (Epoch)`	Epoch interval for vrank	86400
`PFS_THRESHOLD`	Threshold for proposal failure score (PFS) per epoch. PFS ≥ PFS_THRESHOLD indicates severe violation (defined in KIP-227)	TBD
`CFS_THRESHOLD`	Threshold for candidate failure score during VRank testing. CFS ≥ CFS_THRESHOLD indicates failing the testing (defined in KIP-227)	TBD

Overview

The framework defines 9 states divided into two categories:

Candidate States: Unknown (placeholder), CandInactive, CandReady, CandTesting
Validator States: ValActive, ValReady, ValInactive, ValPaused, ValExiting

State transitions may occur either at epoch interval or at arbitrary blocks. Any event that occurred at block N must take effect starting from block N+1. For example, if a validator in ValActive state requests a transition to ValPaused at block N, the validator remains in ValActive for block N, and transitions to ValPaused starting from block N+1.

Only validators in ValActive state participate in consensus and receive rewards. The active validator set consists of the top 50 validators by staking amount at the epoch interval block.

Idle Timer: Validators in ValInactive or ValReady states are subject to ValIdleTimeout. The idle timer accumulates across both states and only resets when the validator transitions to CandInactive or ValActive. Transitions between ValInactive and ValReady do not reset the timer. This mechanism prevents a validator from remaining indefinitely in the ValInactive/ValReady states.

Candidate States

State	Description
Unknown	Conceptual state representing entities not registered in the system.
CandInactive	Candidate not ready to participate in VRank testing at the next epoch.
CandReady	Candidate has signaled readiness to participate in VRank testing at the next epoch.
CandTesting	Candidate undergoing VRank testing to prove infrastructure reliability.

Validator States

State	Description
ValActive	Active validator participating in consensus and earning rewards (= committee). Must be in top 50 by staking amount.
ValReady	Validator has signaled readiness to become `ValActive`. Waiting for top 50 position.
ValInactive	Inactive validator not participating in consensus. Need to signal readiness or exit to avoid timeout.
ValPaused	Validator in maintenance/recovery mode. May be voluntary or forced by VRank violation.
ValExiting	Transitional state for current epoch. Becomes `ValInactive` at next epoch.

State Transitions

State transitions are categorized by timing:

Epoch Interval: Evaluated and executed when the new epoch starts only
Anytime: Can occur immediately upon transaction during any block

Transition Conditions

The User tx is a transaction initiated by the user, while System tx is a system operation initiated by the core client, which follows the same convention as EIP-4788.

Registration & Deregistration

From	To	Timing	Condition	Trigger
Unknown	CandInactive	Anytime	-	User tx
CandInactive	Unknown	Anytime	-	User tx

Candidate Lifecycle

From	To	Timing	Condition	Trigger
CandInactive	CandReady	Anytime	Over `MinStake` AND `CandReady count < MaxReadyCandidateCount` AND `Val* count + CandReady count + CandTesting count < MaxValidatorCount`	User tx
CandReady	CandInactive	Anytime	-	User tx
CandReady	CandInactive	Epoch	Below `MinStake`	System tx
CandReady	CandTesting	Epoch	Over `MinStake`	System tx
CandTesting	CandInactive	Epoch	VRank score exceeds `CFS_THRESHOLD`	System tx
CandTesting	ValActive	Epoch	Pass VRank AND Top 50 by stake	System tx
CandTesting	ValInactive	Epoch	Pass VRank AND Below Top 50	System tx

Validator Active Set

From	To	Timing	Condition	Trigger
ValActive	ValInactive	Epoch	Below Top 50 by stake	System tx
ValInactive	ValReady	Anytime	Over `MinStake`	User tx
ValReady	ValInactive	Anytime	-	User tx
ValReady	ValInactive	Epoch	Below Top 50 by stake	System tx
ValReady	ValActive	Epoch	Top 50 by stake	System tx

Maintenance & Recovery

From	To	Timing	Condition	Trigger
ValActive	ValPaused	Anytime	`ValPaused count < ValPausedSlotLimit` AND request by self for maintenance	User tx
ValActive	ValPaused	Anytime	`ValPaused count < ValPausedSlotLimit` AND PFS is less than `PFS_THRESHOLD` (minor violation)	System tx
ValPaused	ValActive	Anytime	-	User tx
ValPaused	ValInactive	Anytime	`paused duration >= ValPausedTimeout`	System tx
ValPaused	ValInactive	Epoch	Below Top 50 by stake	System tx

Exit & Offboarding

From	To	Timing	Condition	Trigger
ValActive	ValExiting	Anytime	`ValExiting count < ValExitingSlotLimit` AND request by self for offboarding	User tx
ValActive	ValExiting	Anytime	`ValExiting count < ValExitingSlotLimit` AND PFS exceeds `PFS_THRESHOLD` (severe violation)	System tx
ValActive	ValExiting	Anytime	`ValExiting count < ValExitingSlotLimit` AND staking amount below `MinStake` (severe violation)	System tx
ValPaused	ValExiting	Anytime	`ValExiting count < ValExitingSlotLimit` AND request by self for offboarding	User tx
ValExiting	ValInactive	Epoch	-	System tx
ValReady	CandInactive	Anytime	`idle duration >= ValIdleTimeout`	System tx
ValInactive	CandInactive	Anytime	`idle duration >= ValIdleTimeout`	System tx
ValInactive	CandInactive	Anytime	-	User tx

Epoch Transition

Epoch transitions occur at the start of the first block of the next epoch (block.number % VRankEpoch == 0), as part of the block processing logic. It means the epoch transition for [N, N+VRankEpoch-1] epoch is executed at the start of N+VRankEpoch block. The following pseudo code defines the transition ordering:

def process_epoch_transition():
    # T1: Clear transitional states
    for validator in get_validators_by_state(ValExiting):
        transition(validator, ValInactive)

    # T2: Evaluate VRank for candidates in testing
    # Failed candidates return to CandInactive, passed candidates are marked for promotion
    passed_candidates = []
    for candidate in get_candidates_by_state(CandTesting):
        if not passed_vrank(candidate):
            transition(candidate, CandInactive)
        else:
            passed_candidates.append(candidate)

    # Build eligible validator pool for top 50 calculation
    # Pool includes: current active, ready, paused validators, and passed candidates
    # We can assume validators in ValActive have MinStake always since if not, it'll be ValExiting state due to staking violation
    # Exclude any entity with stake below MinStake
    eligible_pool = [
        *get_validators_by_state(ValActive),
        *get_validators_by_state(ValReady),
        *get_validators_by_state(ValPaused),
        *passed_candidates
    ]

    # Determine top 50 by stake (tie-break by address for determinism just in case)
    eligible_pool.sort(key=lambda v: (v.stake, v.address), reverse=True)
    top50 = set(eligible_pool[:ActiveValidatorCount])

    # T3a & T3b: Promote or demote based on top 50
    for entity in eligible_pool:
        if is_top50(entity, top50):
            # T3a: Promote to ValActive (entities in top 50)
            if entity.state in (ValReady, CandTesting):
                transition(entity, ValActive)
            # ValActive stays ValActive
            # ValPaused stays ValPaused (requires voluntary recovery)
        else:
            # T3b: Demote to ValInactive (entities below top 50)
            if entity.state in (ValActive, ValPaused, CandTesting, ValReady):
                transition(entity, ValInactive)

    for candidate in get_candidates_by_state(CandReady):
        # T4a: Start new testing period for ready candidates
        if candidate.stake >= MinStake:
            transition(candidate, CandTesting)
        # T4b: Demote to CandInactive if below MinStake
        else:
            transition(candidate, CandInactive)

def is_top50(entity, top50):
    return entity in top50 and entity.stake >= MinStake

Ordering Rationale:

T1 (ValExiting → ValInactive): Clear exiting validators first to free up slots and ensure they don’t affect top 50 calculation.
T2 (CandTesting evaluation): Evaluate VRank before calculating top 50 since passed candidates become eligible for the active set.
T3a (Promote to ValActive): Entities in top 50 are promoted. ValReady and passed CandTesting transition to ValActive. ValPaused stays paused (recovery is voluntary).
T3b (Demote to ValInactive): Entities below top 50 are demoted. ValActive, ValPaused, ValReady, and passed CandTesting transition to ValInactive.
T4a (CandReady → CandTesting): Start testing last, after all other transitions are complete, so new testers don’t affect the current epoch’s calculations.
T4b (CandReady → CandInactive): Demote to CandInactive if below MinStake.

The following diagram illustrates all valid state transition paths:

flowchart LR
    subgraph Candidates["Candidate States"]
        Unknown
        CandInactive
        CandReady
        CandTesting
    end

    subgraph Validators["Validator States"]
        ValInactive
        ValReady
        ValActive
        ValPaused
        ValExiting
    end

    %% Registration & Deregistration
    Unknown -->|"register"| CandInactive
    CandInactive -->|"deregister"| Unknown

    %% Candidate Lifecycle
    CandInactive -->|"signal ready"| CandReady
    CandReady -->|"cancel ready"| CandInactive
    CandReady -.->|"T4a: start testing"| CandTesting
    CandReady -.->|"T4b: below MinStake"| CandInactive
    CandTesting -.->|"T2: failed VRank"| CandInactive
    CandTesting -.->|"T3a: pass & top 50"| ValActive
    CandTesting -.->|"T3b: pass & below top 50"| ValInactive

    %% Validator Active Set
    ValActive -.->|"T3b: below top 50"| ValInactive
    ValInactive -->|"signal ready"| ValReady
    ValReady -->|"cancel ready"| ValInactive
    ValReady -.->|"T3b: below top 50"| ValInactive
    ValReady -.->|"T3a: top 50"| ValActive

    %% Maintenance & Recovery
    ValActive -->|"self maintenance"| ValPaused
    ValActive -->|"minor VRank violation"| ValPaused
    ValPaused -->|"recovered"| ValActive
    ValPaused -->|"paused timeout"| ValInactive
    ValPaused -.->|"T3b: below top 50"| ValInactive

    %% Exit & Offboarding
    ValPaused -->|"self offboarding"| ValExiting
    ValActive -->|"self offboarding"| ValExiting
    ValActive -->|"severe VRank violation"| ValExiting
    ValExiting -.->|"T1: next epoch"| ValInactive
    ValReady -->|"idle timeout"| CandInactive
    ValInactive -->|"idle timeout"| CandInactive
    ValInactive -->|"self offboarding"| CandInactive

Solid arrows: Anytime transitions
Dotted arrows: Epoch interval transitions

Note: Any state transition not specified above is ILLEGAL and must be rejected by the protocol.

Rationale

Penalty for VRank violation

Currently, the penalty for VRank violation is temporary (ValPaused) or permanent (ValExiting) based on the severity as defined in KIP-227. A severe violation occurs when PFS exceeds or equals PFS_THRESHOLD (i.e., excessive round changes due to proposal failures), while a minor violation occurs when less than PFS_THRESHOLD yet. Additionally, a staking violation—where a validator in ValActive unstakes below MinStake—is also treated as a severe violation, triggering a transition to ValExiting. This will affect to reward suspension but not slashing or extending the lockup period, which is more direct penalty. During the permissionless transition, we’d expect many validators requires enough onboarding period to operate validator node smoothly. If we enforce strict penalty from the early stage, it can lead to many validators being offboarded, which is not desirable and eventually lead to network instability. To prevent this and ensure enough transition period, we came up with current transition conditions. After the permissionless transition, if it turns out that current penalty is not sufficient, we can introduce more strict penalty such as slashing.

`ValInactive` state

With nature of BFT-based consensus (we have room for upgrade), we need appropriate number of validators to operate the network. Without competition model, we can’t expect any new validators to join the network when the ValActive slot is full. To activate the staking competition while keeping the network stable, we introduced the intermediate state ValInactive to allow validators under the top 50 by stake to have enough time to stake more KAIA and join the network.

`ValExiting` state

If a validator wants to exit the network, it can voluntarily submit a request to make itself ValExiting state, and stop participating in the consensus. After the next epoch, it’ll be automatically transitioned to ValInactive state, which can be offboarded freely. This is to ensure not rapidly offboarding validators, which can lead to network instability. If we directly offboard the validator, it’s hard to restrict the number of offboarded (= ValExiting state) validators.

Timeout and Slot Limit

Each validator state has specific timeout and/or slot limit constraints to prevent permanent slot occupation and ensure network stability:

ValInactive and ValReady (Idle Timeout): Without idle timeout, it’s possible that some validators to remain in the ValInactive or ValReady state indefinitely, which effectively take up the slot and prevent new validators from joining the network.
ValPaused (Timeout and Slot Limit): The timeout prevents extended maintenance periods, while the slot limit ensures enough validators remain active for consensus. Additionally, the total pausable time per epoch is limited to prevent abuse—validators who exceed the maximum allowed paused time within an epoch will be penalized through VRank violation as defined in KIP-227. This applies regardless of whether the pause was voluntary or involuntary.
ValExiting (Slot Limit): This prevents too many validators from exiting simultaneously, which could destabilize the network. The slot limit ensures gradual offboarding.

Backwards Compatibility

This KIP introduces a new framework for candidate and validator state and all participants must follow the rules defined in this KIP.

Security Considerations

Query the node state at on-chain cannot be trusted since the on-chain state is not block-level atomic. For example, given the following transaction ordering:

block N:
  tx0: read  validator.state (returns ValActive)
  tx1: write validator.state = ValPaused (transition to ValPaused)
  tx2: read  validator.state (returns ValPaused)

tx2 will read the state as ValPaused, while the validator is still treated as ValActive in the block N. For smart contract developers, it’s highly discouraged to rely on the on-chain node state for critical decisions.

References

KIP-227: Candidate and Validator Evaluation - Defines VRank criteria and evaluation rules
EIP-4788: System Transactions - Defines system transaction convention

KIP 286: Permissionless Validator Lifecycle Source