微信扫码
与创始人交个朋友
我要投稿
ChatTTS是近几日最火的开源项目,短短几天揽获19.7kstar
,ChatTTS是专门为对话场景设计的文本转语音模型,例如LLM助手对话任务。它支持英文和中文两种语言。最大的模型使用了10万小时以上的中英文数据进行训练。实现了自然流畅的语音合成。
项目地址:https://github.com/2noise/ChatTTS/
。官方项目只提供一个python
安装包ChatTTS
,通过python
调用:
import ChatTTS
from IPython.display import Audio
chat = ChatTTS.Chat()
chat.load_models(compile=False) # Set to True for better performance
texts = ["PUT YOUR TEXT HERE",]
wavs = chat.infer(texts, )
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
高级用法:
###################################
# Sample a speaker from Gaussian.
rand_spk = chat.sample_random_speaker()
params_infer_code = {
'spk_emb': rand_spk, # add sampled speaker
'temperature': .3, # using custom temperature
'top_P': 0.7, # top P decode
'top_K': 20, # top K decode
}
###################################
# For sentence level manual control.
# use oral_(0-9), laugh_(0-2), break_(0-7)
# to generate special token in text to synthesize.
params_refine_text = {
'prompt': '[oral_2][laugh_0][break_6]'
}
wav = chat.infer(texts, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wav = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
运行代码,生成一段比较自然的语音,分为男生,女生版,
项目地址:https://github.com/Gouryella/ChatTTS-webui
,ChatTTS Webui是基于ChatTTS的浏览器应用,把原来的应用,导入做为一个api
,用FsstAPI
写了一个server.py
的后端,
import ChatTTS
chat = ChatTTS.Chat()
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class Text2Speech(BaseModel):
text: str
voice_adj: int
temperature: float
top_p: float
top_k: int
model_path = os.path.join(os.path.dirname(__file__), 'models')
model_files = [
os.path.join(model_path, 'asset/Decoder.pt'),
os.path.join(model_path, 'asset/DVAE.pt'),
os.path.join(model_path, 'asset/GPT.pt'),
os.path.join(model_path, 'asset/spk_stat.pt'),
os.path.join(model_path, 'asset/tokenizer.pt'),
os.path.join(model_path, 'asset/Vocos.pt'),
os.path.join(model_path, 'config/decoder.yaml'),
os.path.join(model_path, 'config/dvae.yaml'),
os.path.join(model_path, 'config/gpt.yaml'),
os.path.join(model_path, 'config/path.yaml'),
os.path.join(model_path, 'config/vocos.yaml')
]
all_files_exist = all(os.path.exists(file_path) for file_path in model_files)
assert all_files_exist, "Model files do not exist, please download the models."
print('Load models from local path.')
chat.load_models(source='local', local_path=model_path)
@app.post("/generate")
async def generate_text(request: Text2Speech):
text = request.text
torch.manual_seed(request.voice_adj)
params_infer_code = {
'spk_emb': chat.sample_random_speaker(),
'temperature': request.temperature,
'top_P': request.top_p,
'top_K': request.top_k,
}
wavs = await asyncio.to_thread(chat.infer, text, use_decoder=True, params_infer_code=params_infer_code)
audio_data = np.array(wavs[0])
if audio_data.ndim == 1:
audio_data = np.expand_dims(audio_data, axis=0)
audio_buffer = BytesIO()
sf.write(audio_buffer, audio_data.T, 24000, format='WAV')
audio_buffer.seek(0)
return StreamingResponse(audio_buffer, media_type='audio/wav')
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
用Nuxt
写了简洁前端界面,通过3001商品访问。
这个界面还可以通过生成的二维码,同一网络环境下,扫码用手机端访问。
在文字窗口输入想要转语音的文字,在段落中加入笑声和停顿,选项包括男声,女声选择,他对应的就是声音调节,可以手动调节声音,另外三个项默认即可。点击生成,即可生成相应的声音,耗时大概1分钟,可在线试听,保存下载音频文件。
安装方法比较简单,克隆项目,安装前端的依赖,创建python虚拟环境,安装python包,克隆模型。前置基本条件是环境安装了git
,node
,anaconda
。
git clone https://github.com/Gouryella/ChatTTS-webui.git
cd ChatTTS-webui
npm install
conda create -n chattts python=3.10
conda activate chattts
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
# If you are using Mac OS or do not support CUDA, use
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2
pip install -r requirements.txt
windows
下如果有英伟达GPU
,就执行第3条命令(conda),如果没有,就执行第4条命令(pip)。
cd api
git clone https://huggingface.co/2Noise/ChatTTS.git models
cd ..
模型体积比较大,下载时间有点长,还需要好的网络环境。
npm run dev
python api/server.py
在项目目录运行这两个命令,python要切换到对应的虚拟环境。上面两条命令,需要开2个客服端,分别运行。启动完成后,在浏览器输入:http://127.0.0.1:3001/
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-08-13
2024-05-28
2024-04-26
2024-08-21
2024-06-13
2024-08-04
2024-07-09
2024-09-23
2024-07-18
2024-04-11