GLM-4-9B-Chat-1M与SpringBoot集成：企业级API服务开发-编程阁

GLM-4-9B-Chat-1M与SpringBoot集成：企业级API服务开发

想象一下这个场景：你的产品团队希望为内部知识库增加一个智能问答功能，能够处理长达几十页的技术文档，并给出精准的回答。传统的方案要么处理不了这么长的上下文，要么响应速度慢得让人抓狂，要么并发一高就直接崩溃。

这时候，GLM-4-9B-Chat-1M就派上用场了。这个模型支持1M的上下文长度，差不多能处理200万中文字符，对于企业级的长文档分析场景再合适不过。但问题来了，怎么把这个强大的模型集成到你的SpringBoot应用里，让它既能处理高并发，又能稳定运行？

这就是我们今天要聊的话题。我会带你一步步搭建一个企业级的GLM-4-9B-Chat-1M API服务，从模型部署到SpringBoot集成，再到负载均衡和缓存优化，让你看完就能动手实践。

1. 模型部署：用vLLM搭建高性能推理服务

首先得把模型跑起来。GLM-4-9B-Chat-1M这个模型，直接用Transformers加载的话，对长文本的支持和推理效率都不够理想。所以咱们选择vLLM，这是一个专门为大模型推理优化的框架，吞吐量高，内存管理也做得不错。

1.1 环境准备与模型下载

先准备好服务器环境。GLM-4-9B-Chat-1M对显存要求不低，1M长度推理大概需要4张80G的A100卡。如果显存不够，可以适当降低max_model_len参数，比如设置到8192，这样大概需要18G显存。

# 使用Docker部署vLLM环境 docker run -d -t --rm --net=host --gpus all \ --privileged \ --ipc=host \ --name vllm \ -v /path/to/your/models:/models \ egs-registry.cn-hangzhou.cr.aliyuncs.com/egs/vllm:0.4.0.post1-pytorch2.1.2-cuda12.1.1-cudnn8-ubuntu22.04 # 进入容器 docker exec -it vllm /bin/bash # 从魔搭社区下载模型 pip install modelscope modelscope download --model ZhipuAI/glm-4-9b-chat-1m

1.2 启动vLLM推理服务

模型下载好后，就可以启动服务了。这里有几个关键参数需要注意：

python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8005 \ --model /models/glm-4-9b-chat-1m \ --dtype float16 \ --trust-remote-code \ --served-model-name glm4-9b-chat \ --api-key your-api-key \ --max_model_len 8192 \ --enforce-eager

简单解释一下这些参数：

--max_model_len 8192：设置最大上下文长度，根据你的显存情况调整
--enforce-eager：避免一些图优化问题，让推理更稳定
--trust-remote-code：GLM模型需要这个参数

启动成功后，你可以用curl测试一下：

curl --location 'http://localhost:8005/v1/chat/completions' \ --header 'Authorization: Bearer your-api-key' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm4-9b-chat", "messages": [ {"role": "system", "content": "你是一个有帮助的助手。"}, {"role": "user", "content": "你好，请介绍一下你自己。"} ] }'

如果看到返回了正常的响应，说明模型服务已经跑起来了。

2. SpringBoot集成：构建企业级API网关

模型服务搭好了，接下来就是怎么在SpringBoot里调用它。咱们要做的不是简单的HTTP调用，而是一个完整的企业级解决方案。

2.1 项目结构与依赖配置

先创建一个标准的SpringBoot项目，添加必要的依赖：

<!-- pom.xml --> <dependencies> <!-- SpringBoot基础依赖 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- 连接池和HTTP客户端 --> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> </dependency> <!-- 缓存支持 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency> <!-- Redis缓存 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-redis</artifactId> </dependency> <!-- 配置属性校验 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-validation</artifactId> </dependency> <!-- 监控和健康检查 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> </dependencies>

2.2 配置管理

把vLLM服务的配置放到application.yml里，这样以后修改起来方便：

# application.yml glm: vllm: # 支持多个vLLM实例，实现负载均衡 servers: - url: http://192.168.1.100:8005 api-key: your-api-key-1 weight: 50 # 权重，用于负载均衡 - url: http://192.168.1.101:8005 api-key: your-api-key-2 weight: 50 # 连接池配置 connection: max-total: 100 default-max-per-route: 20 connect-timeout: 5000 socket-timeout: 30000 # 重试策略 retry: max-attempts: 3 backoff-delay: 1000 # 缓存配置 cache: enabled: true ttl: 3600 # 缓存1小时

对应的配置类这样写：

@Configuration @ConfigurationProperties(prefix = "glm.vllm") @Data public class VllmConfig { private List<VllmServer> servers; private ConnectionConfig connection; private RetryConfig retry; private CacheConfig cache; @Data public static class VllmServer { private String url; private String apiKey; private Integer weight; } @Data public static class ConnectionConfig { private Integer maxTotal; private Integer defaultMaxPerRoute; private Integer connectTimeout; private Integer socketTimeout; } @Data public static class RetryConfig { private Integer maxAttempts; private Integer backoffDelay; } @Data public static class CacheConfig { private Boolean enabled; private Integer ttl; } }

2.3 负载均衡客户端

这是核心部分，我们要实现一个智能的负载均衡客户端，能够自动选择可用的vLLM实例：

@Service @Slf4j public class VllmLoadBalancer { private final List<VllmServer> servers; private final AtomicInteger currentIndex = new AtomicInteger(0); private final Map<String, ServerHealth> serverHealthMap = new ConcurrentHashMap<>(); @Autowired public VllmLoadBalancer(VllmConfig vllmConfig) { this.servers = vllmConfig.getServers(); servers.forEach(server -> serverHealthMap.put(server.getUrl(), new ServerHealth())); } /** * 获取可用的服务器（加权轮询） */ public VllmServer getAvailableServer() { List<VllmServer> healthyServers = servers.stream() .filter(server -> serverHealthMap.get(server.getUrl()).isHealthy()) .collect(Collectors.toList()); if (healthyServers.isEmpty()) { throw new RuntimeException("No healthy vLLM servers available"); } // 加权轮询算法 int totalWeight = healthyServers.stream() .mapToInt(VllmServer::getWeight) .sum(); int index = currentIndex.getAndUpdate(i -> (i + 1) % totalWeight); int currentWeight = 0; for (VllmServer server : healthyServers) { currentWeight += server.getWeight(); if (index < currentWeight) { return server; } } return healthyServers.get(0); } /** * 更新服务器健康状态 */ public void updateServerHealth(String url, boolean isHealthy) { ServerHealth health = serverHealthMap.get(url); if (health != null) { health.update(isHealthy); } } /** * 服务器健康状态内部类 */ private static class ServerHealth { private volatile boolean healthy = true; private volatile long lastCheckTime = System.currentTimeMillis(); private final AtomicInteger failureCount = new AtomicInteger(0); public boolean isHealthy() { // 如果连续失败超过3次，标记为不健康 if (failureCount.get() >= 3) { // 30秒后重试 if (System.currentTimeMillis() - lastCheckTime > 30000) { failureCount.set(0); healthy = true; } return false; } return healthy; } public void update(boolean isHealthy) { lastCheckTime = System.currentTimeMillis(); if (isHealthy) { failureCount.set(0); healthy = true; } else { failureCount.incrementAndGet(); if (failureCount.get() >= 3) { healthy = false; } } } } }

2.4 带重试和缓存的HTTP客户端

接下来实现HTTP客户端，要包含重试机制和缓存功能：

@Component @Slf4j public class VllmHttpClient { private final CloseableHttpClient httpClient; private final VllmLoadBalancer loadBalancer; private final CacheManager cacheManager; private final VllmConfig vllmConfig; @Autowired public VllmHttpClient(VllmLoadBalancer loadBalancer, CacheManager cacheManager, VllmConfig vllmConfig) { this.loadBalancer = loadBalancer; this.cacheManager = cacheManager; this.vllmConfig = vllmConfig; // 配置HTTP连接池 PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager(); connectionManager.setMaxTotal(vllmConfig.getConnection().getMaxTotal()); connectionManager.setDefaultMaxPerRoute( vllmConfig.getConnection().getDefaultMaxPerRoute()); this.httpClient = HttpClients.custom() .setConnectionManager(connectionManager) .setDefaultRequestConfig(RequestConfig.custom() .setConnectTimeout(vllmConfig.getConnection().getConnectTimeout()) .setSocketTimeout(vllmConfig.getConnection().getSocketTimeout()) .build()) .build(); } /** * 发送请求到vLLM服务（带重试） */ @Retryable(value = {IOException.class, RuntimeException.class}, maxAttempts = 3, backoff = @Backoff(delay = 1000)) public String sendRequest(ChatRequest request) { VllmServer server = loadBalancer.getAvailableServer(); String cacheKey = null; // 如果启用缓存，先检查缓存 if (vllmConfig.getCache().getEnabled() && request.isCacheable()) { cacheKey = generateCacheKey(request); Cache cache = cacheManager.getCache("vllmResponses"); if (cache != null) { Cache.ValueWrapper cached = cache.get(cacheKey); if (cached != null) { log.info("Cache hit for key: {}", cacheKey); return (String) cached.get(); } } } try { String response = doSendRequest(server, request); // 更新服务器健康状态 loadBalancer.updateServerHealth(server.getUrl(), true); // 缓存响应 if (cacheKey != null) { Cache cache = cacheManager.getCache("vllmResponses"); if (cache != null) { cache.put(cacheKey, response); } } return response; } catch (Exception e) { // 标记服务器不健康 loadBalancer.updateServerHealth(server.getUrl(), false); throw e; } } private String doSendRequest(VllmServer server, ChatRequest request) throws IOException { String url = server.getUrl() + "/v1/chat/completions"; HttpPost httpPost = new HttpPost(url); httpPost.setHeader("Authorization", "Bearer " + server.getApiKey()); httpPost.setHeader("Content-Type", "application/json"); // 构建请求体 ObjectMapper mapper = new ObjectMapper(); Map<String, Object> requestBody = new HashMap<>(); requestBody.put("model", "glm4-9b-chat"); requestBody.put("messages", request.getMessages()); requestBody.put("max_tokens", request.getMaxTokens()); requestBody.put("temperature", request.getTemperature()); String jsonBody = mapper.writeValueAsString(requestBody); httpPost.setEntity(new StringEntity(jsonBody, StandardCharsets.UTF_8)); try (CloseableHttpResponse response = httpClient.execute(httpPost)) { int statusCode = response.getCode(); if (statusCode == 200) { return EntityUtils.toString(response.getEntity()); } else { throw new IOException("HTTP error: " + statusCode); } } } private String generateCacheKey(ChatRequest request) { // 基于消息内容和参数生成缓存键 String messagesJson = new ObjectMapper().writeValueAsString(request.getMessages()); String params = String.format("maxtokens-%d-temp-%.2f", request.getMaxTokens(), request.getTemperature()); return DigestUtils.md5DigestAsHex((messagesJson + params).getBytes()); } }

2.5 业务层服务

现在把HTTP客户端包装成业务服务：

@Service @Slf4j public class ChatService { private final VllmHttpClient vllmHttpClient; private final ObjectMapper objectMapper; @Autowired public ChatService(VllmHttpClient vllmHttpClient) { this.vllmHttpClient = vllmHttpClient; this.objectMapper = new ObjectMapper(); } /** * 单轮对话 */ public ChatResponse chat(String userMessage) { ChatRequest request = ChatRequest.builder() .messages(Arrays.asList( Message.builder() .role("user") .content(userMessage) .build() )) .maxTokens(1024) .temperature(0.7) .cacheable(true) .build(); try { String responseJson = vllmHttpClient.sendRequest(request); return parseResponse(responseJson); } catch (Exception e) { log.error("Chat request failed", e); throw new BusinessException("AI服务暂时不可用，请稍后重试"); } } /** * 多轮对话（支持长上下文） */ public ChatResponse chatWithHistory(List<Message> history, String newMessage) { List<Message> messages = new ArrayList<>(history); messages.add(Message.builder() .role("user") .content(newMessage) .build()); ChatRequest request = ChatRequest.builder() .messages(messages) .maxTokens(2048) // 长对话需要更多token .temperature(0.7) .cacheable(false) // 带历史的对话不缓存 .build(); try { String responseJson = vllmHttpClient.sendRequest(request); return parseResponse(responseJson); } catch (Exception e) { log.error("Chat with history failed", e); throw new BusinessException("AI服务暂时不可用，请稍后重试"); } } /** * 流式响应（适合长文本生成） */ public void chatStream(String userMessage, SseEmitter emitter) { // 这里简化处理，实际应该使用vLLM的流式接口 ChatRequest request = ChatRequest.builder() .messages(Arrays.asList( Message.builder() .role("user") .content(userMessage) .build() )) .maxTokens(2048) .temperature(0.7) .cacheable(false) .build(); try { String responseJson = vllmHttpClient.sendRequest(request); ChatResponse response = parseResponse(responseJson); // 模拟流式输出（实际应该分块发送） String content = response.getContent(); int chunkSize = 50; for (int i = 0; i < content.length(); i += chunkSize) { String chunk = content.substring(i, Math.min(i + chunkSize, content.length())); emitter.send(SseEmitter.event() .data(chunk) .id(String.valueOf(i))); Thread.sleep(50); // 模拟生成延迟 } emitter.complete(); } catch (Exception e) { emitter.completeWithError(e); } } private ChatResponse parseResponse(String responseJson) throws JsonProcessingException { JsonNode root = objectMapper.readTree(responseJson); String content = root.path("choices") .get(0) .path("message") .path("content") .asText(); return ChatResponse.builder() .content(content) .tokensUsed(root.path("usage").path("total_tokens").asInt()) .build(); } }

2.6 控制器层

最后是暴露给外部的API接口：

@RestController @RequestMapping("/api/v1/chat") @Validated @Slf4j public class ChatController { private final ChatService chatService; @Autowired public ChatController(ChatService chatService) { this.chatService = chatService; } /** * 简单对话接口 */ @PostMapping("/simple") public ResponseEntity<ApiResponse<ChatResponse>> simpleChat( @Valid @RequestBody SimpleChatRequest request) { try { ChatResponse response = chatService.chat(request.getMessage()); return ResponseEntity.ok(ApiResponse.success(response)); } catch (BusinessException e) { return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE) .body(ApiResponse.error(e.getMessage())); } } /** * 带历史记录的对话 */ @PostMapping("/with-history") public ResponseEntity<ApiResponse<ChatResponse>> chatWithHistory( @Valid @RequestBody HistoryChatRequest request) { try { ChatResponse response = chatService.chatWithHistory( request.getHistory(), request.getMessage()); return ResponseEntity.ok(ApiResponse.success(response)); } catch (BusinessException e) { return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE) .body(ApiResponse.error(e.getMessage())); } } /** * 流式对话接口 */ @GetMapping("/stream") public SseEmitter streamChat(@RequestParam String message) { SseEmitter emitter = new SseEmitter(30000L); // 30秒超时 executorService.execute(() -> chatService.chatStream(message, emitter)); return emitter; } /** * 健康检查接口 */ @GetMapping("/health") public ResponseEntity<Map<String, Object>> healthCheck() { Map<String, Object> health = new HashMap<>(); health.put("status", "UP"); health.put("timestamp", System.currentTimeMillis()); try { // 测试一个简单请求 chatService.chat("你好"); health.put("vllmService", "UP"); } catch (Exception e) { health.put("vllmService", "DOWN"); health.put("error", e.getMessage()); } return ResponseEntity.ok(health); } }

3. 高级特性与优化

3.1 限流与熔断

在高并发场景下，限流和熔断是必须的。我们可以用Resilience4j来实现：

@Configuration public class ResilienceConfig { @Bean public RateLimiterRegistry rateLimiterRegistry() { return RateLimiterRegistry.of( RateLimiterConfig.custom() .limitRefreshPeriod(Duration.ofSeconds(1)) .limitForPeriod(10) // 每秒10个请求 .timeoutDuration(Duration.ofMillis(100)) .build() ); } @Bean public CircuitBreakerRegistry circuitBreakerRegistry() { return CircuitBreakerRegistry.of( CircuitBreakerConfig.custom() .failureRateThreshold(50) // 失败率阈值50% .waitDurationInOpenState(Duration.ofSeconds(30)) .slidingWindowSize(10) .minimumNumberOfCalls(5) .build() ); } } // 在服务层使用 @Service public class ProtectedChatService { private final ChatService chatService; private final RateLimiter rateLimiter; private final CircuitBreaker circuitBreaker; public ProtectedChatService(ChatService chatService, RateLimiterRegistry rateLimiterRegistry, CircuitBreakerRegistry circuitBreakerRegistry) { this.chatService = chatService; this.rateLimiter = rateLimiterRegistry.rateLimiter("chatLimiter"); this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("chatBreaker"); } @RateLimiter(name = "chatLimiter") @CircuitBreaker(name = "chatBreaker", fallbackMethod = "fallbackChat") public ChatResponse protectedChat(String message) { return chatService.chat(message); } private ChatResponse fallbackChat(String message, Throwable t) { return ChatResponse.builder() .content("服务暂时不可用，请稍后重试") .fallback(true) .build(); } }

3.2 监控与指标收集

监控是生产环境必不可少的。我们可以用Micrometer收集指标：

@Component public class ChatMetrics { private final MeterRegistry meterRegistry; private final Timer chatTimer; private final Counter successCounter; private final Counter failureCounter; public ChatMetrics(MeterRegistry meterRegistry) { this.meterRegistry = meterRegistry; this.chatTimer = Timer.builder("chat.request.duration") .description("Chat request duration") .register(meterRegistry); this.successCounter = Counter.builder("chat.request.success") .description("Successful chat requests") .register(meterRegistry); this.failureCounter = Counter.builder("chat.request.failure") .description("Failed chat requests") .register(meterRegistry); } public ChatResponse trackChat(Callable<ChatResponse> chatCallable) { return chatTimer.recordCallable(() -> { try { ChatResponse response = chatCallable.call(); successCounter.increment(); return response; } catch (Exception e) { failureCounter.increment(); throw e; } }); } }

3.3 异步处理与队列

对于不需要实时响应的场景，可以用消息队列异步处理：

@Component @Slf4j public class AsyncChatProcessor { private final ChatService chatService; private final ThreadPoolTaskExecutor executor; private final Map<String, CompletableFuture<ChatResponse>> pendingRequests = new ConcurrentHashMap<>(); @Autowired public AsyncChatProcessor(ChatService chatService) { this.chatService = chatService; this.executor = new ThreadPoolTaskExecutor(); executor.setCorePoolSize(10); executor.setMaxPoolSize(50); executor.setQueueCapacity(1000); executor.setThreadNamePrefix("async-chat-"); executor.initialize(); } /** * 提交异步聊天任务 */ public String submitAsyncChat(String message) { String taskId = UUID.randomUUID().toString(); CompletableFuture<ChatResponse> future = CompletableFuture.supplyAsync(() -> { try { return chatService.chat(message); } catch (Exception e) { log.error("Async chat failed", e); throw new CompletionException(e); } }, executor); pendingRequests.put(taskId, future); // 设置超时自动清理 future.orTimeout(30, TimeUnit.SECONDS) .whenComplete((result, error) -> { if (error != null) { log.warn("Task {} completed with error: {}", taskId, error.getMessage()); } // 30分钟后清理完成的任务 executor.schedule(() -> pendingRequests.remove(taskId), 30, TimeUnit.MINUTES); }); return taskId; } /** * 获取异步任务结果 */ public ChatResponse getAsyncResult(String taskId, long timeoutMs) { CompletableFuture<ChatResponse> future = pendingRequests.get(taskId); if (future == null) { throw new IllegalArgumentException("Task not found: " + taskId); } try { return future.get(timeoutMs, TimeUnit.MILLISECONDS); } catch (TimeoutException e) { throw new BusinessException("Task still processing"); } catch (Exception e) { throw new BusinessException("Task failed: " + e.getMessage()); } } }

4. 部署与运维

4.1 Docker容器化部署

把整个应用打包成Docker镜像，方便部署：

# Dockerfile FROM openjdk:11-jre-slim # 安装必要的工具 RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* # 创建应用目录 WORKDIR /app # 复制JAR文件 COPY target/glm-chat-service.jar app.jar # 复制配置文件 COPY config/application.yml config/ # 健康检查 HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \ CMD curl -f http://localhost:8080/actuator/health || exit 1 # 运行应用 ENTRYPOINT ["java", "-jar", "app.jar"]

4.2 Kubernetes部署配置

如果用Kubernetes，可以这样配置：

# deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: glm-chat-service spec: replicas: 3 selector: matchLabels: app: glm-chat template: metadata: labels: app: glm-chat spec: containers: - name: app image: your-registry/glm-chat-service:latest ports: - containerPort: 8080 env: - name: SPRING_PROFILES_ACTIVE value: "prod" resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 30 periodSeconds: 5 --- # service.yaml apiVersion: v1 kind: Service metadata: name: glm-chat-service spec: selector: app: glm-chat ports: - port: 80 targetPort: 8080 type: LoadBalancer

4.3 性能调优建议

根据实际使用经验，这里有几个性能调优的建议：

连接池优化：根据并发量调整HTTP连接池大小，避免连接数不足或过多
缓存策略：对于常见问题，可以设置较长的缓存时间；对于个性化问题，适当缩短或禁用缓存
批量处理：如果有批量处理需求，可以实现批量接口，减少网络开销
模型参数调优：根据业务场景调整temperature、max_tokens等参数，平衡生成质量和速度
监控告警：设置关键指标告警，如响应时间、错误率、缓存命中率等

5. 实际应用场景

这套方案在实际项目中能解决不少问题。比如：

智能客服场景：用户咨询产品问题，系统能快速从知识库中找到相关信息并生成回答。多轮对话功能让客服对话更自然，长上下文支持让系统能记住整个对话历史。

文档分析场景：上传技术文档或合同文件，让AI帮忙总结要点、回答特定问题。1M的上下文长度意味着能处理很长的文档，不用切分得支离破碎。

代码助手场景：开发人员可以询问代码相关问题，AI能基于项目上下文给出针对性建议。异步处理功能让生成复杂代码建议时不会阻塞用户界面。

内容生成场景：营销团队需要批量生成产品描述、广告文案等，可以用异步队列处理，提高吞吐量。

6. 总结

把GLM-4-9B-Chat-1M集成到SpringBoot里，做成企业级的API服务，听起来复杂，但拆解开来就是几个关键步骤：先把模型用vLLM跑起来，然后在SpringBoot里做好HTTP客户端，加上负载均衡、缓存、重试这些企业级功能，最后考虑监控、限流这些运维层面的东西。

实际做的时候，最大的挑战可能是性能调优和稳定性保障。vLLM服务本身的稳定性、网络延迟、并发处理能力，这些都需要根据实际业务量来调整。缓存策略也很关键，用好了能大幅提升响应速度，用不好可能导致数据不一致。

这套方案我们已经在几个项目里用过了，效果还不错。特别是长文档处理场景，相比之前用的方案，效果提升很明显。当然，具体实施的时候还需要根据你的业务需求做一些调整，比如缓存时间、限流阈值这些参数，都需要在实际运行中慢慢优化。

如果你正准备做类似的项目，建议先从简单的版本开始，跑通整个流程，然后再逐步添加高级功能。这样既能快速验证效果，也方便后续迭代优化。

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

GLM-4-9B-Chat-1M与SpringBoot集成：企业级API服务开发