KIP 227: Candidate and Validator Evaluation

Author	Joseph, Lewis, Ian, Ollie, Lake
Discussions-To	https://github.com/kaiachain/kips/issues/84
Status	Draft
Type	Core
Created	2025-01-07

Simple Summary

This proposal presents VRank, a framework for quantitatively assessing the performance and stability of candidates and validators in the Kaia Chain network.

Abstract

As Kaia Chain transitions to a permissionless network structure, it is critical to hold validators more accountable. They play an important role in ensuring that the network runs smoothly, securely, and without problems. This KIP introduces VRank, a Validator Reputation Evaluation Framework that quantitatively assesses the performance and stability of both candidates and validators. VRank aims to ensure that nodes involved in the consensus mechanism are trustworthy and capable of meeting the security requirements of the permissionless Kaia Chain network.

Introduction

It is planned for the Kaia Chain network to change from a Permissioned network to a Permissionless network. In a permissionless network, anyone can become a validator without being approved first. This makes the network more decentralized, safe, and open to everyone. This change fits with Kaia Chain’s goal of making the blockchain ecosystem more open and strong. Please refer to KGP-4: Permissionless Kaia Chain for more thorough details for switching to a permissionless network.

Motivation

In decentralized networks employing blockchain technology, the reliability and performance of validator nodes are crucial for maintaining network stability, security, and efficiency. Validator nodes propose and validate new blocks, ensuring ledger integrity and establishing trust among participants through consistent operation.

However, not all validator nodes operate at optimal efficiency. Certain individuals may experience frequent failures or delays, while others may exhibit malicious behavior, either deliberately or due to external pressures. These problems may lead to delays in the network, a chance of forks, and higher susceptibility to attacks, such as double-spending and censorship.

The current systems for evaluating validator performance may ineffectively penalize persistent underperformance or malicious behavior, and they fail to consistently encourage optimal performance incentives. A thorough evaluation system is essential to precisely evaluate the reliability of validator nodes, discourage inadequate performance, and improve the overall integrity of the network.

This proposal introduces an innovative evaluation framework for candidates and validators, highlighting measurable performance metrics. Our objective is to create a more resilient and fair framework for assessing node performance by defining metrics such as the Proposal Failure Score (PFS) and Candidate Failure Score (CFS). This framework aims to identify underperforming or malicious nodes, which helps preserve high standards among validators.

The framework promotes consistent uptime and reliability. Validators have an incentive to maintain the stability and responsiveness of their nodes, thereby maintaining the performance of the network.

Specification

Parameters

Constant	Value/Definition
`FORK_BLOCK`	TBD
`CANDIDATE_MSG_TIMEOUT`	Protocol parameter (milliseconds). Default = 500ms.
`EPOCH_LENGTH`	86,400 blocks (approximately 1 day, assuming 1-second block time)
`MAX_BYZANTINE_NODES` (`F`)	Calculated as `F = floor(len(proposerSet) / 3)`, where `proposerSet` is the set of distinct proposers observed in the current epoch up to and including the block at which CFS is computed
`PFS_THRESHOLD`	2 (reaching the threshold means not qualified)
`CFS_THRESHOLD`	300 (reaching the threshold means not qualified)

Data Structures and Protocol Primitives

Block Header Extension (VRank)

Starting from FORK_BLOCK, the block header includes a new field VRank. Its payload depends on the block position within the epoch:

Block position	`header.VRank`
`N % EPOCH_LENGTH != 0`	`RLPEncode(cfReport(N))`, or `nil` if empty
`N % EPOCH_LENGTH == 0`	`RLPEncode(CandTesting(N))` (MUST NOT be `nil`)

Note: The index N of cfReport(N) refers to the block in which the report is recorded, not the block it evaluates. There are two perspectives on the same data:

Writer (proposer of block N): builds the report from candidate evaluation conducted during block N-1’s consensus (i.e., from VRankCandidate messages collected for block N-1), and writes it into header(N).VRank.

Reader (any node): decodes header(N).VRank to obtain cfReport(N).

In short: evaluate(N-1) → cfReport(N) → header(N).VRank.

type Header struct {
    ParentHash   common.Hash
    // ... existing fields ...
    Extra        []byte
    Governance   []byte
    Vote         []byte
    BaseFee      *big.Int
    RandomReveal []byte
    MixHash      []byte
    VRank        []byte  // New field
}

Reports (pfReport, cfReport)

Both pfReport and cfReport are per-block data structures. A node’s presence in either report is undesirable: it indicates a failure, and the node may be penalized in future epoch evaluations.

pfReport(N) (Proposal Failure Report): For the mined block N, pfReport(N) = { GetProposer(N, R) : R ∈ [0, r) } where r is the round that reached consensus for block N. Extractable from header(N).Extra.

Format: pfReport(N) -> [proposerAddrRound0, proposerAddrRound1, ...] with at most one entry per validator (validator(N)).

cfReport(N) (Candidate Failure Report): Covers candidate evaluation during block N-1’s consensus. Recorded in block N. Contains the list of candidates (nodes in CandTesting at block N-1) that failed to send a valid VRankCandidate message on-time for block N-1.

Format: cfReport(N) -> [candidateAddr1, candidateAddr2, ...] with at most one entry per candidate of previous block (candidate(N-1)).

VRankPreprepare

VRankPreprepare is a message type sent by the proposer of block N to all candidates under CandTesting after having sent Istanbul Preprepare messages to consensus participants. It triggers candidates to respond with VRankCandidate. The timeout for candidate response is CANDIDATE_MSG_TIMEOUT (default 500ms).

type VRankPreprepare struct {
	Block *types.Block
	View  *istanbul.View
	Sig   []byte // proposer's signature over vrankPreprepareSigHash(chainID, blockNum, round, blockHash)
}

VRankCandidate

VRankCandidate is a message type sent by each candidate (node in CandTesting) to all validators under ValActive upon receiving VRankPreprepare. A candidate must send VRankCandidate within CANDIDATE_MSG_TIMEOUT of the counterparty’s preprepared_time to be counted as on-time.

Signature scheme: VRankCandidate carries two signatures:

ECDSA signature (Sig): MUST be produced with the candidate’s validator signing key over keccak256("VRANK_CANDIDATE_V1" || chain_id || block_number || round || block_hash), with an unambiguous canonical encoding of each field.
BLS signature (BlsSig): MUST be produced with the candidate’s registered BLS key (per KIP-113) over the same hash as the ECDSA signature: keccak256("VRANK_CANDIDATE_V1" || chain_id || block_number || round || block_hash).

type VRankCandidate struct {
	BlockNumber uint64
	Round       uint8
	BlockHash   common.Hash
	Sig         []byte // ECDSA signature over vrankCandidateSigHash(chainID, blockNumber, round, blockHash)
	BlsSig      []byte // BLS signature over vrankCandidateSigHash(chainID, blockNumber, round, blockHash)
}

Consensus Protocol Integration

VRank runs in parallel with consensus. Per block, reports (pfReport and cfReport) are produced during consensus and committed in the next block header.

Proposer of block N

After having sent Istanbul Preprepare messages to consensus participants, the proposer MUST send VRankPreprepare to all candidates in CandTesting.
The round information is recorded in header.Extra as part of the existing consensus. If the proposer fails to propose and a round change occurs, the failed proposer’s address is recorded in pfReport(N).

Validators during consensus for block N

When block N enters the preprepared pBFT state, each validator MUST record preprepared_time.
Each validator MUST collect VRankCandidate messages from candidates in CandTesting and record each message’s arrival time. A message is considered valid only if both its ECDSA signature (Sig) and its BLS signature (BlsSig) are valid. The BLS public key is resolved from the candidate’s KIP-113 registration at the current chain head.
If a validator receives more than one VRankCandidate from the same candidate for the same view (block number N and round R), only the first valid message MUST be accepted; subsequent messages MUST be ignored.
A candidate is counted as on-time if the message is valid and either (a) it arrives before preprepared_time, or (b) arrival_time - preprepared_time ≤ CANDIDATE_MSG_TIMEOUT. Otherwise, it will be recorded in cfReport(N+1).

Candidates (nodes in CandTesting)

Upon receiving VRankPreprepare for block N, each candidate MUST broadcast VRankCandidate to all validators in ValActive.
To be counted as on-time, the VRankCandidate MUST arrive at each validator within CANDIDATE_MSG_TIMEOUT of that validator’s preprepared_time for block N.

Proposer of block N+1

The proposer MUST set header.VRank per the encoding table in Block Header Extension (VRank).
cfReport(N+1) MUST include each candidate (in CandTesting at block N) who either (a) did not send a VRankCandidate for block N on-time, or (b) sent an invalid message (including ECDSA or BLS signature failure, or a missing KIP-113 BLS key registration).
Candidates in cfReport are counted as failures for CFS aggregation. The epoch-start candidate list is informational only and does not contribute to CFS.
If block N+1 is an epoch-start block ((N+1) % EPOCH_LENGTH == 0), the proposer MUST NOT include a cfReport for block N; instead, header(N+1).VRank carries CandTesting(N+1) as specified in the encoding table.

Block Validation

Before FORK_BLOCK, header.VRank MUST be empty (zero-length bytes).

After FORK_BLOCK, validators MUST validate header.VRank per the encoding table in Block Header Extension (VRank), and additionally:

At epoch-start (N % EPOCH_LENGTH == 0), the decoded list MUST exactly equal CandTesting(N) resolved at block N (preserving the order returned by the valset module, with no duplicates).
At a non-epoch block with a non-empty payload, cfReport(N) MUST be sorted in ascending byte order, contain at most one entry per candidate ID, and each entry MUST be a candidate address from CandTesting(N-1).
At a non-epoch block, an empty payload (nil or zero-length) is permitted and represents no candidate failures for that block.

Failure Scores (PFS, CFS)

Each score is per epoch, computed from pfReport and cfReport in epoch blocks. Higher values indicate worse performance; zero indicates no failures.

Proposal Failure Score (PFS): For a given block number N, PFS MUST be computed from pfReport(b) for blocks x ∈ [epochStart(N), N]. For each validator, count how many times the validator appears across all pfReports in the epoch (each round change adds one entry). PFS maps each validator address to its total proposal failure count.

Format: pfs(N) -> map[proposerAddr]score

Candidate Failure Score (CFS): For a given block number N, CFS MUST be computed from cfReport(b) for blocks b ∈ [epochStart(N), N] (note that cfReport(epochStart(N)) is empty). For each candidate C and reporter (proposer of block b): if C is in cfReport(b), that counts as 1 failure. Let proposerSet be the set of distinct proposers across all blocks b ∈ [epochStart(N), N] (every block contributes its proposer to proposerSet, including blocks with an empty cfReport), and let F = floor(len(proposerSet) / 3). For each candidate, sum failures per reporter over the epoch, discard the highest F reporter totals (Byzantine filtering), and sum the remainder to obtain CFS.

Format: cfs(N) -> map[candidateAddr]score

Example: Byzantine filtering in CFS

Example 1: Short epoch

epoch = 5
len(candidates) = 3

proposer(5)=P1, cfReport(5)=[]
proposer(6)=P2, cfReport(6)=[]
proposer(7)=P3, cfReport(7)=[C1,C2,C3]
proposer(8)=P4, cfReport(8)=[C1,C2]
proposer(9)=P4, cfReport(9)=[C1,C2]

proposerSet = {P1, P2, P3, P4}
F = len(proposerSet) / 3 = 4 / 3 = 1

Aggregated cfReport(N) where N ∈ [5, 9]:

Candidate \ Reporter	raw data (cfReport)				summary (CFS)
Candidate \ Reporter	P1	P2	P3	P4	Total	Filtered	Byzantine filtering
C1	0	0	1	2	3	1	P4 is not counted
C2	0	0	1	2	3	1	P4 is not counted
C3	0	0	1	0	1	0	P3 is not counted

Example 2: Byzantine behavior

Consider a network with 10 validators (proposers P1–P10) and 5 candidates (C1–C5). The table shows how many times each candidate appears in cfReport(N) when each proposer produced a report (i.e., failures reported per candidate per reporter). P8, P9, and P10 report abnormally high counts for C1–C3, suggesting Byzantine behavior. With F = 3, we discard the highest 3 reporter totals per candidate and sum the remainder to obtain the filtered CFS.

Candidate \ Reporter	raw data (cfReport)										summary (CFS)
Candidate \ Reporter	P1	P2	P3	P4	P5	P6	P7	P8	P9	P10	Total	Filtered	Byzantine filtering
C1	14	12	15	34	12	32	20	8640	8637	8634	26050	139	exclude reports from P8,P9,P10
C2	48	10	59	33	49	49	41	8640	8637	8634	26200	289	exclude reports from P8,P9,P10
C3	48	22	40	41	44	27	61	8640	8637	8634	26194	283	exclude reports from P8,P9,P10
C4	50	29	45	30	23	2	42	56	56	64	397	221	exclude reports from P8,P9,P10
C5	71	34	62	5	11	20	18	30	19	13	283	116	exclude reports from P1,P2,P3

Note: Each cfReport(N) is in the header of block N and reports on target block N - 1.

Score thresholds

A node must meet the following stability requirements to participate in consensus:

Block proposal participation: Fewer than PFS_THRESHOLD proposal failures per epoch.
Downtime: Less than 0.5% downtime per epoch (fewer than 432 blocks missed).

Violations:

a validator whose PFS reaches PFS_THRESHOLD in an epoch (PFS >= PFS_THRESHOLD) is classified as not qualified.
a candidate whose CFS reaches CFS_THRESHOLD in an epoch (CFS >= CFS_THRESHOLD) is classified as not qualified.

The handling of not-qualified nodes is specified in KIP-286.

Rationale

Choice of CFS_THRESHOLD

Historical data from vrank logs indicates that most healthy nodes pass the evaluation with CFS well below 300 per epoch. The threshold of 300 is therefore set to distinguish underperforming or unstable candidates while allowing normal nodes to qualify for validator promotion.

Importance of Mitigating Malicious Behavior

Byzantine Nodes
In a permissionless environment, some validators may act maliciously, attempting to disrupt the network or unfairly penalize honest nodes. It is assumed that up to one-third of the validators may behave maliciously.

Filtering Mechanisms
To mitigate the impact of malicious validators, the highest F failure reports are excluded in CFS calculations. This ensures that the actions of a few Byzantine nodes do not distort the evaluation of honest candidates.

Robust Scoring Algorithms
VRank’s design ensures that honest nodes are not unfairly penalized due to the actions of Byzantine nodes.

Importance of the 500ms Deadline

Ensuring Consensus Responsiveness

The 500ms deadline for CANDIDATE_MSG_TIMEOUT ensures that candidates respond promptly, supporting the network’s goal of generating blocks every second.

Regional Centralization

Candidate-to-validator latency depends on distance: candidates near the validator cluster (where most validators are located) observe shorter latency; those farther away observe longer latency. A short timeout would favor candidates collocated with the validator majority and effectively exclude those at greater distance, reinforcing regional concentration. This carries significant risks: (a) a regional network outage could affect a large fraction of the validator set, and (b) operators outside the cluster face a higher barrier to participation.
A longer timeout (e.g., 500ms) allows candidates farther from the cluster to participate, improving decentralization and resilience.

Block Time Impact

A longer timeout does not necessarily slow block production. Block progression is driven by the proposer’s Istanbul Preprepare being sent and committed on time; VRankCandidate is an auxiliary evaluation message collected in parallel. The 500ms window accounts for global network latency variations without unfairly penalizing distant candidates, while the primary consensus path remains unaffected.

Design Choice

Given that regional centralization poses a greater risk than the marginal impact of a 500ms evaluation window, the design favors a longer timeout to support geographic diversity.

The omission of signatures in cfReport

If we required valid VRankCandidate messages (with signatures) to be included in cfReport, then each entry would need a verifiable signature. However, the proposer has the authority to include any candidate in cfReport regardless. A malicious proposer could intentionally omit a candidate’s valid signature and claim that the candidate did not send any message—thereby falsely penalizing an honest candidate (a false positive).

Including signatures would block false negatives (a proposer could not falsely claim a candidate failed when they actually sent a valid message). However, if the proposer is an accomplice of the candidate, they could collude to omit the candidate from cfReport even when the candidate failed—bypassing the signature check.

Given that signatures cannot fully prevent manipulation in either direction, and that signatures add significant size to the report, we decided to simplify: cfReport is a list of candidate addresses only (no signatures). The Byzantine filtering in CFS (excluding the highest F reporter totals) mitigates the impact of malicious proposers.

The exclusion of pfReport from header.VRank

pfReport is extracted from header.Extra rather than stored in header.VRank. Round-change information is recorded during consensus, before the block is finalized. If pfReport were written into header.VRank upon each round change, the header would need to be updated mid-consensus. Supporting such updates would require substantial changes to the current implementation. The Extra field is already populated during consensus with round-change data, so pfReport is derived from there instead.

`header.VRank` at `k*EPOCH_LENGTH`

The validator set changes every EPOCH_LENGTH, so there may be new validators at block k*EPOCH_LENGTH that were not validators at block k*EPOCH_LENGTH - 1. Those new validators did not participate in consensus for block k*EPOCH_LENGTH - 1 and therefore could not have collected VRankCandidate messages. The proposer of block k*EPOCH_LENGTH may be such a new validator, so they cannot produce a valid cfReport(k*EPOCH_LENGTH). Instead of leaving the field empty, the proposer MUST embed CandTesting(k*EPOCH_LENGTH) — the full candidate list for the new epoch — into header.VRank. This anchors the epoch’s candidate set into the consensus-validated header, giving all nodes a single authoritative reference for who the candidates are when CFS aggregation begins.

Backward Compatibility

The introduction of VRank does not affect existing nodes before FORK_BLOCK. Nodes operating prior to FORK_BLOCK will continue to function as before. After FORK_BLOCK, the new vrank field and associated validation processes come into effect.

Security Considerations

Handling Byzantine Nodes

Assumption of One-Third Malicious Validators
We accept the standard Byzantine fault tolerance assumption that up to one-third of validators may behave maliciously. Kaia Chain relies on the assumption that less than one-third of participants are malicious to ensure safety and liveness. VRank’s scoring mechanism is designed with this threshold in mind, allowing the network to function correctly even in the presence of some malicious actors.

Limitations and Contingencies

If the number of malicious validators exceeds one-third, the network’s ability to reach consensus and maintain integrity may be compromised.

Justification for the Assumption
While it’s challenging to prevent all malicious activity, assuming that up to one-third of validators could be compromised provides a practical balance between security and network performance.

Implementation

See the permissionless branch for the reference implementation.

Appendix: Node Models

The following node models were considered when designing VRank and defining a stable node. They describe the philosophy behind the scoring thresholds and help clarify the types of behavior VRank aims to distinguish.

VRank categorizes nodes into four models to evaluate their performance and stability:

Node	Performance	Impact
Uptime > 99.5%, No network issues	Excellent	Contribute to network stability
Uptime about 99.5% temporally unstable	Good	May delay block time
Uptime < 99.5%	Not good	May fail to propose a block
Halts continuously regardless uptime	Bad	May affect consensus if consists of nodes experiencing this
Uptime > 99.5%, try to destabilize the network	N/A	Threat network integrity

Node A: Stable Node

Characteristics: Capable of performing validation duties with optimal performance and stability.

Impact on Network: Contributes positively to network stability and performance.
Node B: Temporarily Unstable Node

Characteristics: Experiences brief, frequent network disruptions that last a few seconds.

Impact on Network: May delay block creation if selected as a proposer but does not cause a round change.
Node C: Intermittently Stopping Node

Characteristics: Experiences longer network disruptions (tens of seconds).

Impact on Network: May fail to propose a block when selected as a proposer, resulting in round changes and significant delays.
Node M: Malicious Node

Characteristics: Intentionally attempts to destabilize the network through malicious actions.

Impact on Network: Threatens network security and integrity. VRank aims to mitigate the influence of such nodes.

KIP 227: Candidate and Validator Evaluation Source