KIP 286: Permissionless Validator Lifecycle Source

AuthorLewis, Ian, Ollie, Lake, and Joseph
Discussions-Tohttps://github.com/kaiachain/kips/issues/86
StatusDraft
TypeCore
Created2026-01-19
Requires 227

Abstract

This KIP introduces an automatic state transition framework for candidates and validators in the permissionless Kaia network. It defines 9 states (2 registration states, 2 candidate states, 5 validator states) and their allowed transitions, enabling decentralized validator lifecycle management without manual intervention from the Kaia team.

Motivation

The current Kaia network operates in a permissioned manner where the Kaia team manually manages validator states. As the network transitions to permissionless operation, an automatic state management system becomes necessary.

With KIP-227 introducing VRank for quantitative evaluation of validators, this KIP establishes the state machine that governs:

  • How candidates join the network through a VRank testing
  • How validators participate in consensus based on stake ranking
  • How validators are paused or removed upon VRank violations
  • How validators can voluntarily exit or perform maintenance

This framework enables trustless validator lifecycle management while maintaining network stability through structured state transitions.

Specification

Parameters

The following parameters are used in this KIP:

Parameter Description Sample Value
MaxNodeCount Maximum number of nodes in non-Registered states (CandReady, CandTesting, ValActive, ValReady, ValInactive, ValPaused, ValExiting) 100
MaxValActivePausedCount Maximum allowed number of nodes in ValActive or ValPaused (used as the epoch slot competition limit) 50
MaxCandReadyCount Maximum number of candidates that can be in CandReady 3
MinStake Minimum staking amount required to enter the consensus 5,000,000 KAIA
ValPausedTimeout Maximum duration a validator can remain in ValPaused before transitioning to ValInactive 8 hours
ValIdleTimeout Maximum duration a validator can remain in ValInactive or ValReady before transitioning to Registered 30 days
MinActiveCount Minimum number of ValActive validators required for BFT liveness n if n < 4, else ceil(2n/3) where n = ValActive count at epoch start
ValPausedSlotLimit Maximum number of validators that can be in ValPaused 0 if n < 4, else ceil(floor(n/3) / 2) where n = ValActive count at epoch start
ValExitingSlotLimit Maximum number of validators that can be in ValExiting 0 if n < 4, else ceil(floor(n/3) / 2) where n = ValActive count at epoch start
VRankEpoch (Epoch) Epoch interval for vrank 86400
PFS_THRESHOLD Threshold for proposal failure score (PFS) per epoch. PFS ≥ PFS_THRESHOLD indicates severe violation (defined in KIP-227) 2
CFS_THRESHOLD Threshold for candidate failure score during VRank testing. CFS ≥ CFS_THRESHOLD indicates failing the testing (defined in KIP-227) 300

Note: The permissionless hardfork block number MUST be a multiple of VRankEpoch. Otherwise the first post-fork epoch cannot accumulate scores over a full VRankEpoch window, and any query into the pre-fork portion of that epoch has no VRank data to return.

Overview

The framework defines 9 states divided into three categories:

  • Registration States: Unknown, Registered
  • Candidate States: CandReady, CandTesting
  • Validator States: ValActive, ValReady, ValInactive, ValPaused, ValExiting

State transitions may occur either at epoch interval or at arbitrary blocks. Any event that occurred at block N must take effect starting from block N+1. For example, if a validator in ValActive state requests a transition to ValPaused at block N, the validator remains in ValActive for block N, and transitions to ValPaused starting from block N+1.

Only validators in ValActive state participate in consensus and receive rewards. The active validator set consists of the top 50 validators by staking amount at the epoch interval block.

Idle Timer: Validators in ValInactive or ValReady states are subject to ValIdleTimeout. The idle timer accumulates across both states and only resets when the validator transitions to Registered or ValActive. Transitions between ValInactive and ValReady do not reset the timer. This mechanism prevents a validator from remaining indefinitely in the ValInactive/ValReady states.

Registration States

State Description
Unknown Conceptual state representing entities not registered in the system.
Registered State representing entities that are registered in the system.

Candidate States

CandReady Candidate has signaled readiness to participate in VRank testing at the next epoch.
CandTesting Candidate undergoing VRank testing to prove infrastructure reliability.

Validator States

State Description
ValActive Active validator participating in consensus and earning rewards (= committee). Must be in top 50 by staking amount.
ValReady Validator has signaled readiness to become ValActive. Waiting for top 50 position.
ValInactive Inactive validator not participating in consensus. Need to signal readiness or exit to avoid timeout.
ValPaused Validator in maintenance/recovery mode. May be voluntary or forced by VRank violation.
ValExiting Transitional state for current epoch. Becomes ValInactive at next epoch.

State Transitions

State transitions are categorized by timing:

  • Epoch Interval: Evaluated and executed when the new epoch starts only
  • Anytime: Can occur immediately upon transaction during any block

Transition Conditions

The User tx is a transaction initiated by the user, while System tx is a system operation initiated by the core client, which follows the same convention as EIP-4788.

Registration & Deregistration

From To Timing Condition Trigger
Unknown Registered Anytime - User tx
Registered Unknown Anytime - User tx

Candidate Lifecycle

From To Timing Condition Trigger
Registered CandReady Anytime Over MinStake AND CandReady count < MaxCandReadyCount AND Val* count + CandReady count + CandTesting count < MaxNodeCount User tx
CandReady Registered Anytime - User tx
CandReady Registered Epoch Below MinStake System tx
CandReady CandTesting Epoch Over MinStake System tx
CandTesting Registered Epoch VRank score exceeds CFS_THRESHOLD System tx
CandTesting ValActive Epoch Pass VRank AND Top 50 by stake System tx
CandTesting ValInactive Epoch Pass VRank AND Below Top 50 System tx

Validator Active Set

From To Timing Condition Trigger
ValActive ValInactive Epoch Below MinStake System tx
ValActive ValInactive Epoch Below Top 50 by stake System tx
ValPaused ValInactive Epoch Below MinStake System tx
ValInactive ValReady Anytime Over MinStake User tx
ValReady ValInactive Anytime - User tx
ValReady ValInactive Anytime Staking amount below MinStake System tx
ValReady ValInactive Epoch Below MinStake System tx
ValReady ValInactive Epoch Below Top 50 by stake System tx
ValReady ValActive Epoch Top 50 by stake System tx

Maintenance & Recovery

From To Timing Condition Trigger
ValActive ValPaused Anytime ValPaused count < ValPausedSlotLimit AND request by self for maintenance User tx
ValActive ValPaused Anytime ValPaused count < ValPausedSlotLimit AND ValActive count > minActiveCount AND PFS is less than PFS_THRESHOLD (minor violation) System tx
ValPaused ValActive Anytime - User tx
ValPaused ValInactive Anytime paused duration >= ValPausedTimeout System tx
ValPaused ValInactive Epoch Below Top 50 by stake System tx

Exit & Offboarding

From To Timing Condition Trigger
ValActive ValExiting Anytime ValExiting count < ValExitingSlotLimit AND request by self for offboarding User tx
ValActive ValExiting Anytime ValExiting count < ValExitingSlotLimit AND ValActive count > minActiveCount AND PFS exceeds PFS_THRESHOLD (severe violation) System tx
ValActive ValExiting Anytime ValExiting count < ValExitingSlotLimit AND ValActive count > minActiveCount AND staking amount below MinStake System tx
ValPaused ValExiting Anytime ValExiting count < ValExitingSlotLimit AND request by self for offboarding User tx
ValPaused ValExiting Anytime ValExiting count < ValExitingSlotLimit AND staking amount below MinStake System tx
ValExiting ValInactive Epoch - System tx
ValReady Registered Anytime idle duration >= ValIdleTimeout System tx
ValInactive Registered Anytime idle duration >= ValIdleTimeout System tx
ValInactive Registered Anytime - User tx

Transition Ordering

System transitions are processed in the following order at every block:

  1. Epoch transition (epoch blocks only): slot competition, VRank evaluation, candidate promotion
  2. Violation transition (every block): MinStake and PFS violations with slot-limited demotions
  3. Timeout transition (every block): idle and pause timeout enforcement

This ordering ensures that violation transitions use the post-epoch epochVACount for correct slot limit calculations. After an epoch transition changes the active validator count, the new epochVACount (epochVACount = len(ValActive)) is used to compute getSlotLimitsFor(epochVACount) for the subsequent violation checks.

Epoch Transition

Epoch transitions occur at the start of the first block of the next epoch (block.number % VRankEpoch == 0), as part of the block processing logic. It means the epoch transition for [N, N+VRankEpoch-1] epoch is executed at the start of N+VRankEpoch block. The following pseudo code defines the transition ordering:

def process_epoch_transition():
    # T1: Clear transitional states
    for validator in get_validators_by_state(ValExiting):
        transition(validator, ValInactive)

    # T2: Evaluate VRank for candidates in testing
    # Failed candidates return to Registered, passed candidates are marked for promotion
    passed_candidates = []
    for candidate in get_candidates_by_state(CandTesting):
        if not passed_vrank(candidate):
            transition(candidate, Registered)
        else:
            passed_candidates.append(candidate)

    # T3b (below MinStake): demote VA/VR/VP below MinStake directly to ValInactive.
    # This runs before the top-50 competition so the epochVACount recomputation reflects the correct count.
    # No slot limit check — unconditional demotion.
    for validator in [*get_validators_by_state(ValActive), *get_validators_by_state(ValReady), *get_validators_by_state(ValPaused)]:
        if validator.stake < MinStake:
            transition(validator, ValInactive)

    # Build eligible validator pool for top 50 calculation.
    # Pool includes: current active, ready, paused validators (those not just demoted), and passed candidates
    # with stake >= MinStake.
    eligible_pool = [
        v for v in [
            *get_validators_by_state(ValActive),
            *get_validators_by_state(ValReady),
            *get_validators_by_state(ValPaused),
            *passed_candidates
        ] if v.stake >= MinStake
    ]

    # Determine top 50 by stake (tie-break by address for determinism just in case)
    eligible_pool.sort(key=lambda v: (v.stake, v.address), reverse=True)
    top50 = set(eligible_pool[:MaxValActivePausedCount])

    # T3a & T3b: Promote or demote based on top 50
    for entity in eligible_pool:
        if entity in top50:
            # T3a: Promote to ValActive (entities in top 50)
            if entity.state in (ValReady, CandTesting):
                transition(entity, ValActive)
            # ValActive stays ValActive
            # ValPaused stays ValPaused (requires voluntary recovery)
        else:
            # T3b: Demote to ValInactive (entities below top 50 by slot competition)
            if entity.state in (ValActive, ValPaused, CandTesting, ValReady):
                transition(entity, ValInactive)

    for candidate in get_candidates_by_state(CandReady):
        # T4a: Start new testing period for ready candidates
        if candidate.stake >= MinStake:
            transition(candidate, CandTesting)
        # T4b: Demote to Registered if below MinStake
        else:
            transition(candidate, Registered)

Ordering Rationale:

  1. T1 (ValExiting → ValInactive): Clear exiting validators first to free up slots and ensure they don’t affect top 50 calculation.

  2. T2 (CandTesting evaluation): Evaluate VRank before calculating top 50 since passed candidates become eligible for the active set.

  3. T3a (Promote to ValActive): Entities in top 50 are promoted. ValReady and passed CandTesting transition to ValActive. ValPaused stays paused (recovery is voluntary).

  4. T3b (Demote to ValInactive): Two cases: (a) ValActive, ValPaused, ValReady below MinStake are demoted unconditionally before the top-50 competition; (b) ValActive, ValPaused, ValReady, and passed CandTesting that are in the eligible pool but below top 50 are demoted after the competition.

  5. T4a (CandReady → CandTesting): Start testing last, after all other transitions are complete, so new testers don’t affect the current epoch’s calculations.

  6. T4b (CandReady → Registered): Demote to Registered if below MinStake.

Violation Transition

Violation transitions run at every block (after epoch transition if on an epoch block). They handle MinStake and PFS violations with slot-limited demotions. Validators are processed in deterministic address order to ensure consistent results across nodes.

def process_violation_transition(sorted_validators):
    # Rule 1: MinStake violation (non-epoch blocks only).
    # At epoch blocks, VA/VR/VP below MinStake are already demoted to ValInactive by T3b in process_epoch_transition.
    for validator in sorted_validators:
        if validator.stake >= MinStake:
            continue
        if validator.state == ValActive:
            # ValActive → ValExiting (if slot available and ValActive > minActiveCount)
            if can_transition(ValExiting):
                transition(validator, ValExiting)
        elif validator.state == ValPaused:
            # ValPaused → ValExiting (if slot available)
            if count_by_state(ValExiting) < maxSlotAvailable:
                transition(validator, ValExiting)
        elif validator.state == ValReady:
            # ValReady → ValInactive (unconditional, not in active set)
            transition(validator, ValInactive)

    # Rule 2: PFS violation (only when proposal failure occurred at this block)
    for validator in sorted_validators:
        if validator.state != ValActive:
            continue
        pfs = get_pfs(validator)
        if pfs >= PFS_THRESHOLD:
            # Severe: ValActive → ValExiting
            if can_transition(ValExiting):
                transition(validator, ValExiting)
        elif pfs > 0:
            # Minor: ValActive → ValPaused
            if can_transition(ValPaused):
                transition(validator, ValPaused)

def can_transition(target_state):
    return count_by_state(target_state) < maxSlotAvailable \
       and count_by_state(ValActive) > minActiveCount

Timeout Transition

Timeout transitions run at every block (after violation transition). They enforce idle and pause timeouts.

def process_timeout_transition(validators):
    for validator in validators:
        if validator.state in (ValReady, ValInactive):
            if idle_duration >= ValIdleTimeout:
                transition(validator, Registered)
        elif validator.state == ValPaused:
            if paused_duration >= ValPausedTimeout:
                transition(validator, ValInactive)

The following diagram illustrates all valid state transition paths:

flowchart LR
    subgraph Registration["Registration States"]
        Unknown
        Registered
    end

    subgraph Candidates["Candidate States"]
        CandReady
        CandTesting
    end

    subgraph Validators["Validator States"]
        ValInactive
        ValReady
        ValActive
        ValPaused
        ValExiting
    end

    %% Registration & Deregistration
    Unknown -->|"register"| Registered
    Registered -->|"deregister"| Unknown

    %% Candidate Lifecycle
    Registered -->|"signal ready"| CandReady
    CandReady -->|"cancel ready"| Registered
    CandReady -.->|"T4a: start testing"| CandTesting
    CandReady -.->|"T4b: below MinStake"| Registered
    CandTesting -.->|"T2: failed VRank"| Registered
    CandTesting -.->|"T3a: pass & top 50"| ValActive
    CandTesting -.->|"T3b: pass & below top 50"| ValInactive

    %% Validator Active Set
    ValActive -.->|"T3b: below top 50"| ValInactive
    ValInactive -->|"signal ready"| ValReady
    ValReady -->|"cancel ready"| ValInactive
    ValReady -->|"below MinStake"| ValInactive
    ValReady -.->|"T3b: below top 50"| ValInactive
    ValReady -.->|"T3a: top 50"| ValActive

    %% Maintenance & Recovery
    ValActive -->|"self maintenance"| ValPaused
    ValActive -->|"minor VRank violation"| ValPaused
    ValPaused -->|"recovered"| ValActive
    ValPaused -->|"paused timeout"| ValInactive
    ValPaused -.->|"T3b: below top 50"| ValInactive

    %% Exit & Offboarding
    ValPaused -->|"self offboarding"| ValExiting
    ValPaused -->|"below MinStake"| ValExiting
    ValActive -->|"self offboarding"| ValExiting
    ValActive -->|"severe VRank violation"| ValExiting
    ValExiting -.->|"T1: next epoch"| ValInactive
    ValReady -->|"idle timeout"| Registered
    ValInactive -->|"idle timeout"| Registered
    ValInactive -->|"self offboarding"| Registered
  • Solid arrows: Anytime transitions
  • Dotted arrows: Epoch interval transitions

Note: Any state transition not specified above is ILLEGAL and must be rejected by the protocol.

Rationale

Penalty for VRank violation

Currently, the penalty for VRank violation is temporary (ValPaused) or permanent (ValExiting) based on the severity as defined in KIP-227. A severe violation occurs when PFS exceeds or equals PFS_THRESHOLD (i.e., excessive round changes due to proposal failures), while a minor violation occurs when less than PFS_THRESHOLD yet. Additionally, a staking violation—where a validator in ValActive unstakes below MinStake—is also treated as a severe violation, triggering a transition to ValExiting. This will affect to reward suspension but not slashing or extending the lockup period, which is more direct penalty. During the permissionless transition, we’d expect many validators requires enough onboarding period to operate validator node smoothly. If we enforce strict penalty from the early stage, it can lead to many validators being offboarded, which is not desirable and eventually lead to network instability. To prevent this and ensure enough transition period, we came up with current transition conditions. After the permissionless transition, if it turns out that current penalty is not sufficient, we can introduce more strict penalty such as slashing.

ValInactive state

With nature of BFT-based consensus (we have room for upgrade), we need appropriate number of validators to operate the network. Without competition model, we can’t expect any new validators to join the network when the ValActive slot is full. To activate the staking competition while keeping the network stable, we introduced the intermediate state ValInactive to allow validators under the top 50 by stake to have enough time to stake more KAIA and join the network.

ValExiting state

If a validator wants to exit the network, it can voluntarily submit a request to make itself ValExiting state, and stop participating in the consensus. After the next epoch, it’ll be automatically transitioned to ValInactive state, which can be offboarded freely. This is to ensure not rapidly offboarding validators, which can lead to network instability. If we directly offboard the validator, it’s hard to restrict the number of offboarded (= ValExiting state) validators.

Timeout and Slot Limit

Each validator state has specific timeout and/or slot limit constraints to prevent permanent slot occupation and ensure network stability:

  • ValInactive and ValReady (Idle Timeout): Without idle timeout, it’s possible that some validators to remain in the ValInactive or ValReady state indefinitely, which effectively take up the slot and prevent new validators from joining the network.

  • ValPaused (Timeout and Slot Limit): The timeout prevents extended maintenance periods, while the slot limit ensures enough validators remain active for consensus. Additionally, the total pausable time per epoch is limited to prevent abuse—validators who exceed the maximum allowed paused time within an epoch will be penalized through VRank violation as defined in KIP-227. This applies regardless of whether the pause was voluntary or involuntary.

  • ValExiting (Slot Limit): This prevents too many validators from exiting simultaneously, which could destabilize the network. The slot limit ensures gradual offboarding.

Backwards Compatibility

This KIP introduces a new framework for candidate and validator state and all participants must follow the rules defined in this KIP.

Security Considerations

Query the node state at on-chain cannot be trusted since the on-chain state is not block-level atomic. For example, given the following transaction ordering:

block N:
  tx0: read  validator.state (returns ValActive)
  tx1: write validator.state = ValPaused (transition to ValPaused)
  tx2: read  validator.state (returns ValPaused)

tx2 will read the state as ValPaused, while the validator is still treated as ValActive in the block N. For smart contract developers, it’s highly discouraged to rely on the on-chain node state for critical decisions.

References

Copyright and related rights waived via CC0.