引言

本文将详细介绍如何基于Raft算法从零开始构建一个完整的分布式KV存储系统。我们将实现一个具有高可用性、强一致性和分区容错性的分布式键值数据库,涵盖客户端SDK、服务端集群、数据分片、故障恢复等核心功能。

通过实际的Java代码实现,你将学习到分布式系统开发的核心技术和最佳实践。

系统架构设计

1. 整体架构图

流程图表

节点说明:

节点描述
Monitor监控服务
Metrics指标收集
Alert告警系统

关系流向:

1
2
3
4
5
Client[客户端应用] → SDK[RaftKV SDK]
SDK → LB[负载均衡]
LB → Node1[Raft节点1]
LB → Node2[Raft节点2]
LB → Node3[Raft节点3]

2. 分层架构

流程图表

节点说明:

节点描述
App应用程序
SDKRaftKV SDK
APIAPI网关
RPCRPC服务
RaftRaft算法层
Log日志复制
SM状态机
Engine存储引擎
Index索引管理
WAL预写日志

关系流向:

1
2
3
4
5
App → SDK
SDK → API
API → RPC
RPC → Raft
Raft → Log

核心组件实现

1. RaftKV核心服务

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
import java.time.Instant;
import java.time.Duration;
import java.nio.ByteBuffer;
import java.util.logging.Logger;

public class RaftKVServer {
    private static final Logger logger = Logger.getLogger(RaftKVServer.class.getName());

    // 服务器配置
    public static class ServerConfig {
        public final String nodeId;
        public final String host;
        public final int port;
        public final Duration requestTimeout = Duration.ofSeconds(5);
        public final Duration leaderElectionTimeout = Duration.ofMillis(300);
        public final Duration heartbeatInterval = Duration.ofMillis(150);
        public final int maxLogEntries = 1000;
        public final boolean enableCompression = true;
        public final int snapshotThreshold = 10000;

        public ServerConfig(String nodeId, String host, int port) {
            this.nodeId = nodeId;
            this.host = host;
            this.port = port;
        }
    }

    // KV操作类型
    public enum OperationType {
        GET, PUT, DELETE, SCAN, BATCH
    }

    // KV操作命令
    public static class Command {
        public final String id;
        public final OperationType type;
        public final String key;
        public final byte[] value;
        public final Map<String, byte[]> batch;
        public final String scanPrefix;
        public final int scanLimit;
        public final Instant timestamp;

        public Command(String id, OperationType type, String key, byte[] value) {
            this.id = id;
            this.type = type;
            this.key = key;
            this.value = value != null ? value.clone() : null;
            this.batch = null;
            this.scanPrefix = null;
            this.scanLimit = 0;
            this.timestamp = Instant.now();
        }

        public Command(String id, Map<String, byte[]> batch) {
            this.id = id;
            this.type = OperationType.BATCH;
            this.key = null;
            this.value = null;
            this.batch = new HashMap<>(batch);
            this.scanPrefix = null;
            this.scanLimit = 0;
            this.timestamp = Instant.now();
        }

        public Command(String id, String scanPrefix, int scanLimit) {
            this.id = id;
            this.type = OperationType.SCAN;
            this.key = null;
            this.value = null;
            this.batch = null;
            this.scanPrefix = scanPrefix;
            this.scanLimit = scanLimit;
            this.timestamp = Instant.now();
        }
    }

    // 操作结果
    public static class Result {
        public final boolean success;
        public final byte[] value;
        public final Map<String, byte[]> scanResults;
        public final String error;
        public final long version;

        public Result(boolean success, byte[] value, String error, long version) {
            this.success = success;
            this.value = value != null ? value.clone() : null;
            this.scanResults = null;
            this.error = error;
            this.version = version;
        }

        public Result(boolean success, Map<String, byte[]> scanResults, String error) {
            this.success = success;
            this.value = null;
            this.scanResults = scanResults != null ? new HashMap<>(scanResults) : null;
            this.error = error;
            this.version = 0;
        }

        public static Result success(byte[] value, long version) {
            return new Result(true, value, null, version);
        }

        public static Result success(Map<String, byte[]> scanResults) {
            return new Result(true, scanResults, null);
        }

        public static Result error(String error) {
            return new Result(false, null, error, 0);
        }
    }

    private final ServerConfig config;
    private final RaftNode raftNode;
    private final KVStateMachine stateMachine;
    private final RequestProcessor requestProcessor;
    private final PerformanceMonitor performanceMonitor;
    private final ScheduledExecutorService scheduler;

    // 待处理的请求
    private final Map<String, CompletableFuture<Result>> pendingRequests = new ConcurrentHashMap<>();

    // 服务器状态
    private volatile boolean isRunning = false;
    private final AtomicLong requestCounter = new AtomicLong(0);

    public RaftKVServer(ServerConfig config, List<String> peers) {
        this.config = config;
        this.stateMachine = new KVStateMachine();
        this.raftNode = new RaftNode(config.nodeId, peers, stateMachine);
        this.requestProcessor = new RequestProcessor();
        this.performanceMonitor = new PerformanceMonitor();
        this.scheduler = Executors.newScheduledThreadPool(4);
    }

    // 启动服务器
    public void start() throws Exception {
        if (isRunning) {
            throw new IllegalStateException("Server is already running");
        }

        logger.info("Starting RaftKV server: " + config.nodeId);

        // 启动Raft节点
        raftNode.start();

        // 启动请求处理器
        requestProcessor.start();

        // 启动性能监控
        performanceMonitor.start();

        // 启动定期任务
        startPeriodicTasks();

        isRunning = true;
        logger.info("RaftKV server started successfully");
    }

    // 停止服务器
    public void stop() {
        if (!isRunning) {
            return;
        }

        logger.info("Stopping RaftKV server: " + config.nodeId);

        isRunning = false;
        scheduler.shutdown();
        requestProcessor.stop();
        performanceMonitor.stop();
        raftNode.stop();

        logger.info("RaftKV server stopped");
    }

    // 处理客户端请求
    public CompletableFuture<Result> handleRequest(Command command) {
        if (!isRunning) {
            return CompletableFuture.completedFuture(
                Result.error("Server is not running"));
        }

        requestCounter.incrementAndGet();
        performanceMonitor.recordRequest(command.type);

        // 只读操作可以在任何节点处理
        if (command.type == OperationType.GET || command.type == OperationType.SCAN) {
            return handleReadRequest(command);
        }

        // 写操作必须通过Raft提交
        return handleWriteRequest(command);
    }

    // 处理读请求
    private CompletableFuture<Result> handleReadRequest(Command command) {
        return CompletableFuture.supplyAsync(() -> {
            try {
                long startTime = System.nanoTime();

                Result result;
                switch (command.type) {
                    case GET:
                        byte[] value = stateMachine.get(command.key);
                        long version = stateMachine.getVersion(command.key);
                        result = value != null ?
                            Result.success(value, version) :
                            Result.error("Key not found");
                        break;

                    case SCAN:
                        Map<String, byte[]> scanResults = stateMachine.scan(
                            command.scanPrefix, command.scanLimit);
                        result = Result.success(scanResults);
                        break;

                    default:
                        result = Result.error("Invalid read operation");
                }

                long duration = System.nanoTime() - startTime;
                performanceMonitor.recordLatency(command.type, duration);

                return result;

            } catch (Exception e) {
                logger.severe("Failed to handle read request: " + e.getMessage());
                return Result.error("Internal error: " + e.getMessage());
            }
        });
    }

    // 处理写请求
    private CompletableFuture<Result> handleWriteRequest(Command command) {
        if (!raftNode.isLeader()) {
            String leaderId = raftNode.getLeaderId();
            if (leaderId != null) {
                return CompletableFuture.completedFuture(
                    Result.error("Not leader, redirect to: " + leaderId));
            } else {
                return CompletableFuture.completedFuture(
                    Result.error("No leader available"));
            }
        }

        CompletableFuture<Result> future = new CompletableFuture<>();
        pendingRequests.put(command.id, future);

        // 将命令提交到Raft日志
        try {
            byte[] commandData = serializeCommand(command);
            raftNode.propose(commandData).whenComplete((success, ex) -> {
                if (ex != null) {
                    pendingRequests.remove(command.id);
                    future.complete(Result.error("Failed to propose: " + ex.getMessage()));
                }
                // 成功的情况会在状态机应用时处理
            });

            // 设置超时
            scheduler.schedule(() -> {
                CompletableFuture<Result> pendingFuture = pendingRequests.remove(command.id);
                if (pendingFuture != null && !pendingFuture.isDone()) {
                    pendingFuture.complete(Result.error("Request timeout"));
                }
            }, config.requestTimeout.toMillis(), TimeUnit.MILLISECONDS);

        } catch (Exception e) {
            pendingRequests.remove(command.id);
            future.complete(Result.error("Failed to serialize command: " + e.getMessage()));
        }

        return future;
    }

    // 命令应用回调(由状态机调用)
    public void onCommandApplied(String commandId, Result result) {
        CompletableFuture<Result> future = pendingRequests.remove(commandId);
        if (future != null && !future.isDone()) {
            future.complete(result);
        }
    }

    // 启动定期任务
    private void startPeriodicTasks() {
        // 性能统计
        scheduler.scheduleWithFixedDelay(() -> {
            performanceMonitor.logStatistics();
        }, 30, 30, TimeUnit.SECONDS);

        // 健康检查
        scheduler.scheduleWithFixedDelay(() -> {
            checkServerHealth();
        }, 10, 10, TimeUnit.SECONDS);

        // 内存清理
        scheduler.scheduleWithFixedDelay(() -> {
            cleanupExpiredRequests();
        }, 60, 60, TimeUnit.SECONDS);
    }

    // 健康检查
    private void checkServerHealth() {
        try {
            boolean isHealthy = raftNode.isHealthy() &&
                               stateMachine.isHealthy() &&
                               pendingRequests.size() < 10000;

            if (!isHealthy) {
                logger.warning("Server health check failed");
            }

        } catch (Exception e) {
            logger.severe("Health check error: " + e.getMessage());
        }
    }

    // 清理过期请求
    private void cleanupExpiredRequests() {
        long now = System.currentTimeMillis();
        pendingRequests.entrySet().removeIf(entry -> {
            CompletableFuture<Result> future = entry.getValue();
            if (future.isDone()) {
                return true;
            }
            // 清理超过5分钟的请求
            return false; // 简化实现,实际应该检查请求时间
        });
    }

    // 序列化命令
    private byte[] serializeCommand(Command command) throws Exception {
        ByteBuffer buffer = ByteBuffer.allocate(4096);

        // 写入命令ID
        byte[] idBytes = command.id.getBytes();
        buffer.putInt(idBytes.length);
        buffer.put(idBytes);

        // 写入操作类型
        buffer.putInt(command.type.ordinal());

        // 写入具体数据
        switch (command.type) {
            case PUT:
                byte[] keyBytes = command.key.getBytes();
                buffer.putInt(keyBytes.length);
                buffer.put(keyBytes);
                buffer.putInt(command.value.length);
                buffer.put(command.value);
                break;

            case DELETE:
                keyBytes = command.key.getBytes();
                buffer.putInt(keyBytes.length);
                buffer.put(keyBytes);
                break;

            case BATCH:
                buffer.putInt(command.batch.size());
                for (Map.Entry<String, byte[]> entry : command.batch.entrySet()) {
                    keyBytes = entry.getKey().getBytes();
                    buffer.putInt(keyBytes.length);
                    buffer.put(keyBytes);
                    buffer.putInt(entry.getValue().length);
                    buffer.put(entry.getValue());
                }
                break;
        }

        buffer.flip();
        byte[] result = new byte[buffer.remaining()];
        buffer.get(result);
        return result;
    }

    // Getter方法
    public boolean isLeader() {
        return raftNode.isLeader();
    }

    public String getLeaderId() {
        return raftNode.getLeaderId();
    }

    public long getRequestCount() {
        return requestCounter.get();
    }

    public ServerConfig getConfig() {
        return config;
    }
}

2. KV状态机实现

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
public class KVStateMachine implements StateMachine {
    private static final Logger logger = Logger.getLogger(KVStateMachine.class.getName());

    // 存储引擎
    private final ConcurrentMap<String, VersionedValue> store = new ConcurrentHashMap<>();
    private final AtomicLong globalVersion = new AtomicLong(0);
    private final ReadWriteLock lock = new ReentrantReadWriteLock();

    // 索引
    private final ConcurrentSkipListMap<String, Set<String>> prefixIndex = new ConcurrentSkipListMap<>();

    // 统计信息
    private final AtomicLong totalKeys = new AtomicLong(0);
    private final AtomicLong totalOperations = new AtomicLong(0);

    // 版本化的值
    public static class VersionedValue {
        public final byte[] value;
        public final long version;
        public final Instant timestamp;
        public final boolean deleted;

        public VersionedValue(byte[] value, long version, boolean deleted) {
            this.value = value != null ? value.clone() : null;
            this.version = version;
            this.timestamp = Instant.now();
            this.deleted = deleted;
        }

        public VersionedValue(byte[] value, long version) {
            this(value, version, false);
        }
    }

    // 快照数据
    public static class Snapshot {
        public final Map<String, VersionedValue> data;
        public final long version;
        public final Instant timestamp;

        public Snapshot(Map<String, VersionedValue> data, long version) {
            this.data = new HashMap<>(data);
            this.version = version;
            this.timestamp = Instant.now();
        }
    }

    public KVStateMachine() {
        logger.info("Initialized KV state machine");
    }

    @Override
    public void apply(LogEntry entry) {
        try {
            totalOperations.incrementAndGet();
            Command command = deserializeCommand(entry.data);
            Result result = executeCommand(command);

            // 通知应用结果
            if (entry.callback != null) {
                entry.callback.onCommandApplied(command.id, result);
            }

        } catch (Exception e) {
            logger.severe("Failed to apply log entry: " + e.getMessage());
        }
    }

    // 执行命令
    private Result executeCommand(Command command) {
        lock.writeLock().lock();
        try {
            switch (command.type) {
                case PUT:
                    return executePut(command.key, command.value);

                case DELETE:
                    return executeDelete(command.key);

                case BATCH:
                    return executeBatch(command.batch);

                default:
                    return Result.error("Unsupported write operation: " + command.type);
            }
        } finally {
            lock.writeLock().unlock();
        }
    }

    // 执行PUT操作
    private Result executePut(String key, byte[] value) {
        long version = globalVersion.incrementAndGet();
        VersionedValue oldValue = store.get(key);

        if (oldValue == null || oldValue.deleted) {
            totalKeys.incrementAndGet();
        }

        VersionedValue newValue = new VersionedValue(value, version);
        store.put(key, newValue);

        // 更新前缀索引
        updatePrefixIndex(key, true);

        logger.fine(String.format("PUT key=%s, version=%d", key, version));
        return Result.success(null, version);
    }

    // 执行DELETE操作
    private Result executeDelete(String key) {
        VersionedValue oldValue = store.get(key);

        if (oldValue == null || oldValue.deleted) {
            return Result.error("Key not found");
        }

        long version = globalVersion.incrementAndGet();
        VersionedValue deletedValue = new VersionedValue(null, version, true);
        store.put(key, deletedValue);
        totalKeys.decrementAndGet();

        // 更新前缀索引
        updatePrefixIndex(key, false);

        logger.fine(String.format("DELETE key=%s, version=%d", key, version));
        return Result.success(null, version);
    }

    // 执行批量操作
    private Result executeBatch(Map<String, byte[]> batch) {
        long batchVersion = globalVersion.get();

        for (Map.Entry<String, byte[]> entry : batch.entrySet()) {
            String key = entry.getKey();
            byte[] value = entry.getValue();

            if (value != null) {
                // PUT操作
                long version = globalVersion.incrementAndGet();
                VersionedValue oldValue = store.get(key);

                if (oldValue == null || oldValue.deleted) {
                    totalKeys.incrementAndGet();
                }

                VersionedValue newValue = new VersionedValue(value, version);
                store.put(key, newValue);
                updatePrefixIndex(key, true);

            } else {
                // DELETE操作
                VersionedValue oldValue = store.get(key);

                if (oldValue != null && !oldValue.deleted) {
                    long version = globalVersion.incrementAndGet();
                    VersionedValue deletedValue = new VersionedValue(null, version, true);
                    store.put(key, deletedValue);
                    totalKeys.decrementAndGet();
                    updatePrefixIndex(key, false);
                }
            }
        }

        logger.fine(String.format("BATCH operation completed, %d entries", batch.size()));
        return Result.success(null, globalVersion.get());
    }

    // GET操作(读取)
    public byte[] get(String key) {
        lock.readLock().lock();
        try {
            VersionedValue value = store.get(key);
            if (value != null && !value.deleted) {
                return value.value.clone();
            }
            return null;
        } finally {
            lock.readLock().unlock();
        }
    }

    // 获取版本号
    public long getVersion(String key) {
        lock.readLock().lock();
        try {
            VersionedValue value = store.get(key);
            return value != null ? value.version : 0;
        } finally {
            lock.readLock().unlock();
        }
    }

    // SCAN操作
    public Map<String, byte[]> scan(String prefix, int limit) {
        lock.readLock().lock();
        try {
            Map<String, byte[]> results = new LinkedHashMap<>();
            int count = 0;

            // 使用前缀索引优化查找
            Set<String> candidateKeys = prefixIndex.get(prefix);
            if (candidateKeys != null) {
                for (String key : candidateKeys) {
                    if (count >= limit) break;

                    VersionedValue value = store.get(key);
                    if (value != null && !value.deleted) {
                        results.put(key, value.value.clone());
                        count++;
                    }
                }
            } else {
                // 回退到全扫描
                for (Map.Entry<String, VersionedValue> entry : store.entrySet()) {
                    if (count >= limit) break;

                    String key = entry.getKey();
                    if (key.startsWith(prefix)) {
                        VersionedValue value = entry.getValue();
                        if (!value.deleted) {
                            results.put(key, value.value.clone());
                            count++;
                        }
                    }
                }
            }

            return results;
        } finally {
            lock.readLock().unlock();
        }
    }

    // 更新前缀索引
    private void updatePrefixIndex(String key, boolean add) {
        // 为简化实现,只索引前2个字符的前缀
        if (key.length() >= 2) {
            String prefix = key.substring(0, 2);
            prefixIndex.computeIfAbsent(prefix, k -> ConcurrentHashMap.newKeySet());

            if (add) {
                prefixIndex.get(prefix).add(key);
            } else {
                Set<String> keys = prefixIndex.get(prefix);
                if (keys != null) {
                    keys.remove(key);
                    if (keys.isEmpty()) {
                        prefixIndex.remove(prefix);
                    }
                }
            }
        }
    }

    @Override
    public byte[] createSnapshot() {
        lock.readLock().lock();
        try {
            Snapshot snapshot = new Snapshot(store, globalVersion.get());
            return serializeSnapshot(snapshot);
        } finally {
            lock.readLock().unlock();
        }
    }

    @Override
    public void restoreSnapshot(byte[] snapshotData) {
        lock.writeLock().lock();
        try {
            Snapshot snapshot = deserializeSnapshot(snapshotData);

            store.clear();
            prefixIndex.clear();

            store.putAll(snapshot.data);
            globalVersion.set(snapshot.version);

            // 重建索引
            long keyCount = 0;
            for (Map.Entry<String, VersionedValue> entry : store.entrySet()) {
                if (!entry.getValue().deleted) {
                    updatePrefixIndex(entry.getKey(), true);
                    keyCount++;
                }
            }

            totalKeys.set(keyCount);

            logger.info(String.format("Restored snapshot: %d keys, version=%d",
                keyCount, snapshot.version));

        } catch (Exception e) {
            logger.severe("Failed to restore snapshot: " + e.getMessage());
            throw new RuntimeException(e);
        } finally {
            lock.writeLock().unlock();
        }
    }

    // 健康检查
    public boolean isHealthy() {
        return store != null && globalVersion.get() >= 0;
    }

    // 获取统计信息
    public Map<String, Object> getStatistics() {
        lock.readLock().lock();
        try {
            Map<String, Object> stats = new HashMap<>();
            stats.put("totalKeys", totalKeys.get());
            stats.put("totalOperations", totalOperations.get());
            stats.put("currentVersion", globalVersion.get());
            stats.put("storeSize", store.size());
            stats.put("indexSize", prefixIndex.size());
            return stats;
        } finally {
            lock.readLock().unlock();
        }
    }

    // 序列化和反序列化方法
    private byte[] serializeSnapshot(Snapshot snapshot) {
        try {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(baos);

            oos.writeLong(snapshot.version);
            oos.writeInt(snapshot.data.size());

            for (Map.Entry<String, VersionedValue> entry : snapshot.data.entrySet()) {
                oos.writeUTF(entry.getKey());
                VersionedValue value = entry.getValue();
                oos.writeLong(value.version);
                oos.writeBoolean(value.deleted);
                if (!value.deleted && value.value != null) {
                    oos.writeInt(value.value.length);
                    oos.write(value.value);
                } else {
                    oos.writeInt(0);
                }
            }

            oos.close();
            return baos.toByteArray();

        } catch (Exception e) {
            throw new RuntimeException("Failed to serialize snapshot", e);
        }
    }

    private Snapshot deserializeSnapshot(byte[] data) {
        try {
            ByteArrayInputStream bais = new ByteArrayInputStream(data);
            ObjectInputStream ois = new ObjectInputStream(bais);

            long version = ois.readLong();
            int size = ois.readInt();

            Map<String, VersionedValue> storeData = new HashMap<>();

            for (int i = 0; i < size; i++) {
                String key = ois.readUTF();
                long valueVersion = ois.readLong();
                boolean deleted = ois.readBoolean();
                int valueLength = ois.readInt();

                byte[] value = null;
                if (valueLength > 0) {
                    value = new byte[valueLength];
                    ois.readFully(value);
                }

                storeData.put(key, new VersionedValue(value, valueVersion, deleted));
            }

            ois.close();
            return new Snapshot(storeData, version);

        } catch (Exception e) {
            throw new RuntimeException("Failed to deserialize snapshot", e);
        }
    }

    private Command deserializeCommand(byte[] data) throws Exception {
        ByteBuffer buffer = ByteBuffer.wrap(data);

        // 读取命令ID
        int idLength = buffer.getInt();
        byte[] idBytes = new byte[idLength];
        buffer.get(idBytes);
        String id = new String(idBytes);

        // 读取操作类型
        OperationType type = OperationType.values()[buffer.getInt()];

        switch (type) {
            case PUT:
                int keyLength = buffer.getInt();
                byte[] keyBytes = new byte[keyLength];
                buffer.get(keyBytes);
                String key = new String(keyBytes);

                int valueLength = buffer.getInt();
                byte[] value = new byte[valueLength];
                buffer.get(value);

                return new Command(id, type, key, value);

            case DELETE:
                keyLength = buffer.getInt();
                keyBytes = new byte[keyLength];
                buffer.get(keyBytes);
                key = new String(keyBytes);

                return new Command(id, type, key, null);

            case BATCH:
                int batchSize = buffer.getInt();
                Map<String, byte[]> batch = new HashMap<>();

                for (int i = 0; i < batchSize; i++) {
                    keyLength = buffer.getInt();
                    keyBytes = new byte[keyLength];
                    buffer.get(keyBytes);
                    key = new String(keyBytes);

                    valueLength = buffer.getInt();
                    if (valueLength > 0) {
                        value = new byte[valueLength];
                        buffer.get(value);
                        batch.put(key, value);
                    } else {
                        batch.put(key, null); // 表示删除
                    }
                }

                return new Command(id, batch);

            default:
                throw new IllegalArgumentException("Unsupported operation type: " + type);
        }
    }
}

3. 客户端SDK实现

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
public class RaftKVClient {
    private static final Logger logger = Logger.getLogger(RaftKVClient.class.getName());

    // 客户端配置
    public static class ClientConfig {
        public final List<String> serverAddresses;
        public final Duration requestTimeout = Duration.ofSeconds(5);
        public final Duration connectionTimeout = Duration.ofSeconds(3);
        public final int maxRetries = 3;
        public final Duration retryDelay = Duration.ofMillis(100);
        public final boolean enableLoadBalancing = true;
        public final boolean enableCompression = true;

        public ClientConfig(List<String> serverAddresses) {
            this.serverAddresses = new ArrayList<>(serverAddresses);
        }
    }

    // 连接管理器
    public static class ConnectionManager {
        private final Map<String, SocketChannel> connections = new ConcurrentHashMap<>();
        private final AtomicInteger roundRobinIndex = new AtomicInteger(0);
        private final List<String> serverAddresses;
        private volatile String currentLeader = null;

        public ConnectionManager(List<String> serverAddresses) {
            this.serverAddresses = new ArrayList<>(serverAddresses);
        }

        public SocketChannel getConnection(String address) throws Exception {
            return connections.computeIfAbsent(address, addr -> {
                try {
                    String[] parts = addr.split(":");
                    String host = parts[0];
                    int port = Integer.parseInt(parts[1]);

                    SocketChannel channel = SocketChannel.open();
                    channel.connect(new InetSocketAddress(host, port));
                    channel.configureBlocking(true);

                    return channel;
                } catch (Exception e) {
                    throw new RuntimeException("Failed to connect to " + addr, e);
                }
            });
        }

        public String selectServer(boolean preferLeader) {
            if (preferLeader && currentLeader != null) {
                return currentLeader;
            }

            // 负载均衡选择
            int index = roundRobinIndex.getAndIncrement() % serverAddresses.size();
            return serverAddresses.get(index);
        }

        public void updateLeader(String leaderId) {
            this.currentLeader = leaderId;
        }

        public void removeConnection(String address) {
            SocketChannel connection = connections.remove(address);
            if (connection != null) {
                try {
                    connection.close();
                } catch (Exception e) {
                    // 忽略关闭错误
                }
            }
        }
    }

    private final ClientConfig config;
    private final ConnectionManager connectionManager;
    private final AtomicLong requestIdGenerator = new AtomicLong(0);
    private final ExecutorService executor;

    public RaftKVClient(ClientConfig config) {
        this.config = config;
        this.connectionManager = new ConnectionManager(config.serverAddresses);
        this.executor = Executors.newFixedThreadPool(4);
    }

    // GET操作
    public CompletableFuture<byte[]> get(String key) {
        return executeRequest(new Command(generateRequestId(), OperationType.GET, key, null), false)
            .thenApply(result -> {
                if (result.success) {
                    return result.value;
                } else {
                    throw new RuntimeException("GET failed: " + result.error);
                }
            });
    }

    // PUT操作
    public CompletableFuture<Long> put(String key, byte[] value) {
        return executeRequest(new Command(generateRequestId(), OperationType.PUT, key, value), true)
            .thenApply(result -> {
                if (result.success) {
                    return result.version;
                } else {
                    throw new RuntimeException("PUT failed: " + result.error);
                }
            });
    }

    // DELETE操作
    public CompletableFuture<Long> delete(String key) {
        return executeRequest(new Command(generateRequestId(), OperationType.DELETE, key, null), true)
            .thenApply(result -> {
                if (result.success) {
                    return result.version;
                } else {
                    throw new RuntimeException("DELETE failed: " + result.error);
                }
            });
    }

    // SCAN操作
    public CompletableFuture<Map<String, byte[]>> scan(String prefix, int limit) {
        return executeRequest(new Command(generateRequestId(), prefix, limit), false)
            .thenApply(result -> {
                if (result.success) {
                    return result.scanResults != null ? result.scanResults : new HashMap<>();
                } else {
                    throw new RuntimeException("SCAN failed: " + result.error);
                }
            });
    }

    // 批量操作
    public CompletableFuture<Long> batch(Map<String, byte[]> operations) {
        return executeRequest(new Command(generateRequestId(), operations), true)
            .thenApply(result -> {
                if (result.success) {
                    return result.version;
                } else {
                    throw new RuntimeException("BATCH failed: " + result.error);
                }
            });
    }

    // 执行请求
    private CompletableFuture<Result> executeRequest(Command command, boolean needLeader) {
        return CompletableFuture.supplyAsync(() -> {
            Exception lastException = null;

            for (int attempt = 0; attempt < config.maxRetries; attempt++) {
                try {
                    String serverAddress = connectionManager.selectServer(needLeader);
                    SocketChannel connection = connectionManager.getConnection(serverAddress);

                    // 发送请求
                    sendRequest(connection, command);

                    // 接收响应
                    Result result = receiveResponse(connection);

                    // 处理重定向
                    if (!result.success && result.error != null && result.error.startsWith("Not leader")) {
                        String[] parts = result.error.split(": ");
                        if (parts.length > 1) {
                            connectionManager.updateLeader(parts[1]);
                            continue; // 重试
                        }
                    }

                    return result;

                } catch (Exception e) {
                    lastException = e;
                    logger.warning(String.format("Request attempt %d failed: %s",
                        attempt + 1, e.getMessage()));

                    if (attempt < config.maxRetries - 1) {
                        try {
                            Thread.sleep(config.retryDelay.toMillis() * (attempt + 1));
                        } catch (InterruptedException ie) {
                            Thread.currentThread().interrupt();
                            break;
                        }
                    }
                }
            }

            throw new RuntimeException("Request failed after " + config.maxRetries +
                " attempts", lastException);

        }, executor);
    }

    // 发送请求
    private void sendRequest(SocketChannel connection, Command command) throws Exception {
        byte[] requestData = serializeRequest(command);

        if (config.enableCompression && requestData.length > 1024) {
            requestData = compress(requestData);
        }

        ByteBuffer header = ByteBuffer.allocate(8);
        header.putInt(requestData.length);
        header.putInt(config.enableCompression ? 1 : 0);
        header.flip();

        connection.write(header);
        connection.write(ByteBuffer.wrap(requestData));
    }

    // 接收响应
    private Result receiveResponse(SocketChannel connection) throws Exception {
        ByteBuffer header = ByteBuffer.allocate(8);
        readFully(connection, header);
        header.flip();

        int responseLength = header.getInt();
        int compressed = header.getInt();

        ByteBuffer responseBuffer = ByteBuffer.allocate(responseLength);
        readFully(connection, responseBuffer);
        responseBuffer.flip();

        byte[] responseData = new byte[responseLength];
        responseBuffer.get(responseData);

        if (compressed == 1) {
            responseData = decompress(responseData);
        }

        return deserializeResponse(responseData);
    }

    // 读取完整数据
    private void readFully(SocketChannel channel, ByteBuffer buffer) throws Exception {
        while (buffer.hasRemaining()) {
            int bytesRead = channel.read(buffer);
            if (bytesRead == -1) {
                throw new IOException("Connection closed");
            }
        }
    }

    // 生成请求ID
    private String generateRequestId() {
        return "req_" + System.currentTimeMillis() + "_" + requestIdGenerator.incrementAndGet();
    }

    // 序列化和压缩方法
    private byte[] serializeRequest(Command command) throws Exception {
        // 实现请求序列化
        return new byte[0]; // 简化实现
    }

    private Result deserializeResponse(byte[] data) throws Exception {
        // 实现响应反序列化
        return null; // 简化实现
    }

    private byte[] compress(byte[] data) {
        // 实现数据压缩
        return data; // 简化实现
    }

    private byte[] decompress(byte[] data) {
        // 实现数据解压缩
        return data; // 简化实现
    }

    // 关闭客户端
    public void close() {
        executor.shutdown();
        connectionManager.connections.values().forEach(connection -> {
            try {
                connection.close();
            } catch (Exception e) {
                // 忽略关闭错误
            }
        });
    }
}

4. 性能监控与管理

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
public class PerformanceMonitor {
    private static final Logger logger = Logger.getLogger(PerformanceMonitor.class.getName());

    // 性能指标
    public static class Metrics {
        public final AtomicLong totalRequests = new AtomicLong(0);
        public final AtomicLong successfulRequests = new AtomicLong(0);
        public final AtomicLong failedRequests = new AtomicLong(0);
        public final Map<OperationType, AtomicLong> requestsByType = new ConcurrentHashMap<>();
        public final Map<OperationType, LongAdder> latencySum = new ConcurrentHashMap<>();
        public final Map<OperationType, AtomicLong> latencyCount = new ConcurrentHashMap<>();

        public Metrics() {
            for (OperationType type : OperationType.values()) {
                requestsByType.put(type, new AtomicLong(0));
                latencySum.put(type, new LongAdder());
                latencyCount.put(type, new AtomicLong(0));
            }
        }
    }

    private final Metrics metrics = new Metrics();
    private final ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
    private volatile boolean isRunning = false;

    public void start() {
        if (isRunning) {
            return;
        }

        isRunning = true;
        scheduler.scheduleWithFixedDelay(this::collectMetrics, 10, 10, TimeUnit.SECONDS);
    }

    public void stop() {
        isRunning = false;
        scheduler.shutdown();
    }

    public void recordRequest(OperationType type) {
        metrics.totalRequests.incrementAndGet();
        metrics.requestsByType.get(type).incrementAndGet();
    }

    public void recordSuccess(OperationType type) {
        metrics.successfulRequests.incrementAndGet();
    }

    public void recordFailure(OperationType type) {
        metrics.failedRequests.incrementAndGet();
    }

    public void recordLatency(OperationType type, long latencyNanos) {
        metrics.latencySum.get(type).add(latencyNanos);
        metrics.latencyCount.get(type).incrementAndGet();
    }

    private void collectMetrics() {
        try {
            Map<String, Object> metricsData = new HashMap<>();

            metricsData.put("total_requests", metrics.totalRequests.get());
            metricsData.put("successful_requests", metrics.successfulRequests.get());
            metricsData.put("failed_requests", metrics.failedRequests.get());

            // 按操作类型统计
            for (OperationType type : OperationType.values()) {
                String prefix = type.name().toLowerCase();
                metricsData.put(prefix + "_requests", metrics.requestsByType.get(type).get());

                long count = metrics.latencyCount.get(type).get();
                if (count > 0) {
                    long sum = metrics.latencySum.get(type).sum();
                    double avgLatency = (double) sum / count / 1_000_000; // 转换为毫秒
                    metricsData.put(prefix + "_avg_latency_ms", avgLatency);
                }
            }

            // 计算QPS
            long totalRequests = metrics.totalRequests.get();
            // 简化实现,实际应该基于时间窗口计算
            metricsData.put("qps", totalRequests / Math.max(1, System.currentTimeMillis() / 1000));

            logger.info("Performance metrics: " + metricsData);

        } catch (Exception e) {
            logger.warning("Failed to collect metrics: " + e.getMessage());
        }
    }

    public void logStatistics() {
        Map<String, Object> stats = getStatistics();
        logger.info("=== Performance Statistics ===");
        stats.forEach((key, value) -> logger.info(String.format("%s: %s", key, value)));
    }

    public Map<String, Object> getStatistics() {
        Map<String, Object> stats = new HashMap<>();

        long total = metrics.totalRequests.get();
        long successful = metrics.successfulRequests.get();
        long failed = metrics.failedRequests.get();

        stats.put("total_requests", total);
        stats.put("successful_requests", successful);
        stats.put("failed_requests", failed);
        stats.put("success_rate", total > 0 ? (double) successful / total * 100 : 0);

        for (OperationType type : OperationType.values()) {
            String prefix = type.name().toLowerCase();
            stats.put(prefix + "_requests", metrics.requestsByType.get(type).get());

            long count = metrics.latencyCount.get(type).get();
            if (count > 0) {
                long sum = metrics.latencySum.get(type).sum();
                double avgLatency = (double) sum / count / 1_000_000;
                stats.put(prefix + "_avg_latency_ms", String.format("%.2f", avgLatency));
            }
        }

        return stats;
    }
}

集成测试与基准测试

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
public class RaftKVSystemTest {
    private static final Logger logger = Logger.getLogger(RaftKVSystemTest.class.getName());

    public static void main(String[] args) throws Exception {
        // 运行完整的系统测试
        runBasicFunctionalityTest();
        runConsistencyTest();
        runFailoverTest();
        runPerformanceBenchmark();
        runScalabilityTest();
    }

    // 基本功能测试
    private static void runBasicFunctionalityTest() throws Exception {
        System.out.println("=== 基本功能测试 ===");

        // 创建3节点集群
        List<RaftKVServer> cluster = createCluster(3);
        RaftKVClient client = createClient(cluster);

        try {
            // 测试基本的CRUD操作
            String testKey = "test_key";
            byte[] testValue = "test_value".getBytes();

            // PUT操作
            Long version = client.put(testKey, testValue).get();
            System.out.println("PUT操作成功,版本: " + version);

            // GET操作
            byte[] retrievedValue = client.get(testKey).get();
            System.out.println("GET操作成功,值: " + new String(retrievedValue));

            // DELETE操作
            Long deleteVersion = client.delete(testKey).get();
            System.out.println("DELETE操作成功,版本: " + deleteVersion);

            // 验证删除
            try {
                client.get(testKey).get();
                System.out.println("错误:应该抛出异常");
            } catch (Exception e) {
                System.out.println("DELETE验证成功:" + e.getMessage());
            }

            // 批量操作
            Map<String, byte[]> batch = new HashMap<>();
            for (int i = 0; i < 10; i++) {
                batch.put("batch_key_" + i, ("batch_value_" + i).getBytes());
            }

            Long batchVersion = client.batch(batch).get();
            System.out.println("批量操作成功,版本: " + batchVersion);

            // SCAN操作
            Map<String, byte[]> scanResults = client.scan("batch_", 5).get();
            System.out.printf("SCAN操作成功,找到 %d 个键\n", scanResults.size());

            System.out.println("基本功能测试通过!");

        } finally {
            client.close();
            shutdownCluster(cluster);
        }
    }

    // 一致性测试
    private static void runConsistencyTest() throws Exception {
        System.out.println("\n=== 一致性测试 ===");

        List<RaftKVServer> cluster = createCluster(5);
        List<RaftKVClient> clients = new ArrayList<>();

        // 为每个节点创建客户端
        for (RaftKVServer server : cluster) {
            RaftKVClient.ClientConfig config = new RaftKVClient.ClientConfig(
                Arrays.asList(server.getConfig().host + ":" + server.getConfig().port));
            clients.add(new RaftKVClient(config));
        }

        try {
            // 写入数据
            String key = "consistency_test";
            byte[] value = "initial_value".getBytes();

            RaftKVClient writeClient = clients.get(0);
            writeClient.put(key, value).get();

            // 等待数据同步
            Thread.sleep(1000);

            // 从所有节点读取数据,验证一致性
            boolean consistencyPassed = true;
            for (int i = 0; i < clients.size(); i++) {
                try {
                    byte[] readValue = clients.get(i).get(key).get();
                    String readString = new String(readValue);

                    if (!readString.equals("initial_value")) {
                        System.out.printf("节点 %d 数据不一致: %s\n", i, readString);
                        consistencyPassed = false;
                    }
                } catch (Exception e) {
                    System.out.printf("节点 %d 读取失败: %s\n", i, e.getMessage());
                    consistencyPassed = false;
                }
            }

            if (consistencyPassed) {
                System.out.println("一致性测试通过!");
            } else {
                System.out.println("一致性测试失败!");
            }

        } finally {
            clients.forEach(RaftKVClient::close);
            shutdownCluster(cluster);
        }
    }

    // 故障转移测试
    private static void runFailoverTest() throws Exception {
        System.out.println("\n=== 故障转移测试 ===");

        List<RaftKVServer> cluster = createCluster(5);
        RaftKVClient client = createClient(cluster);

        try {
            // 找到当前leader
            RaftKVServer currentLeader = null;
            for (RaftKVServer server : cluster) {
                if (server.isLeader()) {
                    currentLeader = server;
                    break;
                }
            }

            if (currentLeader == null) {
                System.out.println("没有找到leader,测试失败");
                return;
            }

            System.out.println("当前leader: " + currentLeader.getConfig().nodeId);

            // 写入一些数据
            for (int i = 0; i < 10; i++) {
                client.put("failover_key_" + i, ("value_" + i).getBytes()).get();
            }

            // 停止当前leader
            System.out.println("停止当前leader...");
            currentLeader.stop();

            // 等待新leader选举
            Thread.sleep(2000);

            // 验证系统仍然可用
            boolean systemAvailable = false;
            for (int attempt = 0; attempt < 10; attempt++) {
                try {
                    client.put("failover_test_after", "test_value".getBytes()).get();
                    systemAvailable = true;
                    break;
                } catch (Exception e) {
                    Thread.sleep(500);
                }
            }

            if (systemAvailable) {
                System.out.println("故障转移测试通过!");
            } else {
                System.out.println("故障转移测试失败!");
            }

            // 验证数据完整性
            for (int i = 0; i < 10; i++) {
                try {
                    byte[] value = client.get("failover_key_" + i).get();
                    String expected = "value_" + i;
                    if (!expected.equals(new String(value))) {
                        System.out.println("数据完整性检查失败!");
                        return;
                    }
                } catch (Exception e) {
                    System.out.println("读取数据失败: " + e.getMessage());
                    return;
                }
            }

            System.out.println("数据完整性检查通过!");

        } finally {
            client.close();
            shutdownCluster(cluster);
        }
    }

    // 性能基准测试
    private static void runPerformanceBenchmark() throws Exception {
        System.out.println("\n=== 性能基准测试 ===");

        List<RaftKVServer> cluster = createCluster(3);
        RaftKVClient client = createClient(cluster);

        try {
            // 写入性能测试
            int writeCount = 10000;
            byte[] testValue = new byte[1024]; // 1KB数据
            new Random().nextBytes(testValue);

            System.out.println("开始写入性能测试...");
            long writeStartTime = System.currentTimeMillis();

            List<CompletableFuture<Long>> writeFutures = new ArrayList<>();
            for (int i = 0; i < writeCount; i++) {
                writeFutures.add(client.put("perf_key_" + i, testValue));
            }

            CompletableFuture.allOf(writeFutures.toArray(new CompletableFuture[0])).join();
            long writeEndTime = System.currentTimeMillis();

            long writeDuration = writeEndTime - writeStartTime;
            double writeQPS = (double) writeCount / writeDuration * 1000;

            System.out.printf("写入测试完成: %d 请求, 耗时 %d ms, QPS: %.2f\n",
                writeCount, writeDuration, writeQPS);

            // 读取性能测试
            System.out.println("开始读取性能测试...");
            long readStartTime = System.currentTimeMillis();

            List<CompletableFuture<byte[]>> readFutures = new ArrayList<>();
            for (int i = 0; i < writeCount; i++) {
                readFutures.add(client.get("perf_key_" + i));
            }

            CompletableFuture.allOf(readFutures.toArray(new CompletableFuture[0])).join();
            long readEndTime = System.currentTimeMillis();

            long readDuration = readEndTime - readStartTime;
            double readQPS = (double) writeCount / readDuration * 1000;

            System.out.printf("读取测试完成: %d 请求, 耗时 %d ms, QPS: %.2f\n",
                writeCount, readDuration, readQPS);

        } finally {
            client.close();
            shutdownCluster(cluster);
        }
    }

    // 可扩展性测试
    private static void runScalabilityTest() throws Exception {
        System.out.println("\n=== 可扩展性测试 ===");

        int[] clusterSizes = {3, 5, 7};
        int requestCount = 1000;
        byte[] testValue = "scalability_test".getBytes();

        for (int size : clusterSizes) {
            System.out.printf("测试 %d 节点集群...\n", size);

            List<RaftKVServer> cluster = createCluster(size);
            RaftKVClient client = createClient(cluster);

            try {
                long startTime = System.currentTimeMillis();

                List<CompletableFuture<Long>> futures = new ArrayList<>();
                for (int i = 0; i < requestCount; i++) {
                    futures.add(client.put("scale_key_" + i, testValue));
                }

                CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
                long endTime = System.currentTimeMillis();

                long duration = endTime - startTime;
                double qps = (double) requestCount / duration * 1000;

                System.out.printf("集群大小: %d, QPS: %.2f\n", size, qps);

            } finally {
                client.close();
                shutdownCluster(cluster);
            }
        }
    }

    // 辅助方法
    private static List<RaftKVServer> createCluster(int size) throws Exception {
        List<String> peers = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            peers.add("node" + i);
        }

        List<RaftKVServer> cluster = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            RaftKVServer.ServerConfig config = new RaftKVServer.ServerConfig(
                "node" + i, "localhost", 8080 + i);
            RaftKVServer server = new RaftKVServer(config, peers);
            cluster.add(server);
        }

        // 启动所有服务器
        for (RaftKVServer server : cluster) {
            server.start();
        }

        // 等待leader选举
        Thread.sleep(2000);

        return cluster;
    }

    private static RaftKVClient createClient(List<RaftKVServer> cluster) {
        List<String> addresses = cluster.stream()
            .map(server -> server.getConfig().host + ":" + server.getConfig().port)
            .collect(Collectors.toList());

        RaftKVClient.ClientConfig config = new RaftKVClient.ClientConfig(addresses);
        return new RaftKVClient(config);
    }

    private static void shutdownCluster(List<RaftKVServer> cluster) {
        for (RaftKVServer server : cluster) {
            server.stop();
        }
    }
}

部署与运维指南

1. 配置文件示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# raftkv.yaml
server:
  node_id: "node1"
  host: "127.0.0.1"
  port: 8080
  data_dir: "/data/raftkv"

cluster:
  peers:
    - "node1:127.0.0.1:8080"
    - "node2:127.0.0.1:8081"
    - "node3:127.0.0.1:8082"

raft:
  election_timeout: 300ms
  heartbeat_interval: 150ms
  snapshot_threshold: 10000
  max_log_entries: 1000

storage:
  sync_write: true
  compression: true
  cache_size: 100MB

monitoring:
  metrics_port: 9090
  health_check_port: 8090
  log_level: "INFO"

2. 启动脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/bin/bash

# start_raftkv.sh

NODE_ID=${1:-node1}
CONFIG_FILE=${2:- raftkv.yaml}
JVM_OPTS=${3:-"-Xmx2g -Xms1g"}

JAVA_OPTS="$JVM_OPTS -Dnode.id=$NODE_ID -Dconfig.file=$CONFIG_FILE"

echo "Starting RaftKV node: $NODE_ID"
echo "Config file: $CONFIG_FILE"
echo "JVM options: $JAVA_OPTS"

java $JAVA_OPTS -cp "lib/*" com.example.raftke.RaftKVMain

3. 监控告警

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class MonitoringService {
    public void setupAlerts() {
        // CPU使用率告警
        registerAlert("cpu_usage", (metrics) -> {
            double cpuUsage = metrics.getCpuUsage();
            return cpuUsage > 80.0;
        }, "CPU usage is high: {}%");

        // 内存使用率告警
        registerAlert("memory_usage", (metrics) -> {
            double memoryUsage = metrics.getMemoryUsage();
            return memoryUsage > 85.0;
        }, "Memory usage is high: {}%");

        // Leader选举告警
        registerAlert("leadership_change", (metrics) -> {
            long leadershipChanges = metrics.getLeadershipChanges();
            return leadershipChanges > 10; // 10分钟内超过10次
        }, "Frequent leadership changes detected: {} times");

        // 请求延迟告警
        registerAlert("high_latency", (metrics) -> {
            double avgLatency = metrics.getAverageLatency();
            return avgLatency > 1000.0; // 超过1秒
        }, "High request latency detected: {}ms");
    }
}

总结

通过本文的实战开发,我们完整实现了一个基于Raft算法的分布式KV存储系统,包含:

核心特性

  • 强一致性: 基于Raft算法保证数据一致性
  • 高可用性: 支持节点故障自动恢复
  • 分区容错: 能够在网络分区时继续提供服务
  • 性能优化: 批量操作、压缩传输、索引优化

技术亮点

  • 完整的Raft实现: Leader选举、日志复制、快照机制
  • 高性能存储引擎: 内存存储、前缀索引、版本控制
  • 智能客户端: 负载均衡、故障转移、重试机制
  • 全面监控: 性能指标、健康检查、告警系统

生产就绪特性

  • 配置管理: 灵活的配置文件支持
  • 运维工具: 启动脚本、监控面板、日志分析
  • 测试覆盖: 功能测试、性能测试、故障测试
  • 扩展性: 支持集群动态扩容

这个分布式KV系统展示了现代分布式系统开发的核心技术和最佳实践,可以作为学习分布式系统的参考实现,也可以作为生产环境的基础组件。