FRCRN语音降噪实战手册：librosa+ffmpeg预处理+PyTorch推理全链路-编程阁

FRCRN语音降噪实战手册：librosa+ffmpeg预处理+PyTorch推理全链路

1. 项目概述

FRCRN（Frequency-Recurrent Convolutional Recurrent Network）是阿里巴巴达摩院在ModelScope社区开源的一款专业级语音降噪模型。这个实战手册将带您从零开始，完整掌握使用FRCRN进行语音降噪的全流程技术栈。

核心优势：

专为16kHz单声道音频优化
有效消除各类背景噪声（键盘声、风声、交通声等）
保持人声清晰度，避免"机器人声"失真
支持GPU加速，处理速度快

2. 环境准备与安装

2.1 基础环境要求

确保您的系统满足以下条件：

Python 3.8+
PyTorch 1.10+
CUDA 11.3+（如需GPU加速）
FFmpeg（音频处理工具）

推荐使用conda创建虚拟环境：

conda create -n frcrn python=3.8 conda activate frcrn

2.2 依赖安装

安装必要的Python包：

pip install modelscope librosa pydub ffmpeg-python

2.3 模型下载

通过ModelScope获取FRCRN模型：

from modelscope.pipelines import pipeline ans_pipeline = pipeline('speech_frcrn_ans_cirm_16k', model='damo/speech_frcrn_ans_cirm_16k')

首次运行会自动下载约300MB的模型文件。

3. 音频预处理实战

3.1 格式转换与重采样

FRCRN要求输入为16kHz单声道WAV格式。使用FFmpeg进行转换：

ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

Python代码实现：

import ffmpeg def convert_audio(input_path, output_path): stream = ffmpeg.input(input_path) stream = ffmpeg.output(stream, output_path, ar=16000, ac=1) ffmpeg.run(stream)

3.2 音量标准化

使用librosa统一音频音量：

import librosa def normalize_audio(path, target_dBFS=-20): audio, sr = librosa.load(path, sr=16000) rms = librosa.feature.rms(y=audio) current_dBFS = 20 * np.log10(rms.mean()) gain = target_dBFS - current_dBFS audio_normalized = audio * (10 ** (gain / 20)) return audio_normalized, sr

4. 核心降噪流程

4.1 基础降噪实现

from modelscope.pipelines import pipeline def denoise_audio(input_path, output_path): # 初始化管道 ans_pipeline = pipeline('speech_frcrn_ans_cirm_16k') # 执行降噪 result = ans_pipeline(input_path) # 保存结果 with open(output_path, 'wb') as f: f.write(result['output_pcm'])

4.2 批量处理实现

import os def batch_denoise(input_dir, output_dir): os.makedirs(output_dir, exist_ok=True) ans_pipeline = pipeline('speech_frcrn_ans_cirm_16k') for file in os.listdir(input_dir): if file.endswith('.wav'): input_path = os.path.join(input_dir, file) output_path = os.path.join(output_dir, f'denoised_{file}') result = ans_pipeline(input_path) with open(output_path, 'wb') as f: f.write(result['output_pcm'])

5. 效果优化技巧

5.1 参数调优

# 高级参数设置示例 ans_pipeline = pipeline( 'speech_frcrn_ans_cirm_16k', model_revision='v1.0.1', device='cuda:0', # 使用GPU加速 frame_length=512, # 帧长 hop_length=256 # 帧移 )

5.2 后处理增强

def post_process(audio, sr=16000): # 高频增强 audio = librosa.effects.preemphasis(audio, coef=0.97) # 动态范围压缩 audio = np.tanh(audio * 2) * 0.8 return audio

6. 实战案例：电话录音降噪

6.1 场景分析

输入：8kHz电话录音（需升采样）
噪声：键盘敲击声、办公室背景声
目标：清晰提取人声

6.2 完整代码

import librosa import soundfile as sf from modelscope.pipelines import pipeline def enhance_call_recording(input_path, output_path): # 升采样到16kHz y, sr = librosa.load(input_path, sr=8000) y_16k = librosa.resample(y, orig_sr=sr, target_sr=16000) # 临时保存中间文件 temp_path = 'temp.wav' sf.write(temp_path, y_16k, 16000) # 降噪处理 ans_pipeline = pipeline('speech_frcrn_ans_cirm_16k') result = ans_pipeline(temp_path) # 保存结果 with open(output_path, 'wb') as f: f.write(result['output_pcm']) # 清理临时文件 os.remove(temp_path)

7. 性能优化指南

7.1 GPU加速技巧

import torch # 检查GPU可用性 device = 'cuda' if torch.cuda.is_available() else 'cpu' # 初始化管道时指定设备 ans_pipeline = pipeline( 'speech_frcrn_ans_cirm_16k', device=device )

7.2 内存优化

对于长音频文件，建议分块处理：

def chunk_processing(input_path, output_path, chunk_size=30): # 读取音频 y, sr = librosa.load(input_path, sr=16000) # 计算总时长(秒) duration = len(y) / sr # 分块处理 results = [] for start in range(0, int(duration), chunk_size): end = min(start + chunk_size, duration) chunk = y[start*sr : end*sr] # 保存临时chunk temp_path = f'temp_{start}.wav' sf.write(temp_path, chunk, sr) # 处理chunk result = ans_pipeline(temp_path) results.append(result['output_pcm']) # 清理 os.remove(temp_path) # 合并结果 with open(output_path, 'wb') as f: for chunk in results: f.write(chunk)