普通视图

GPT-SoVITS for local inference on Intel or Apple Silicon Mac

2024年1月21日 14:00

Introduction

The GitHub repository for “GPT-SoVITS” is a project focused on voice data processing and text-to-speech (TTS) technology. It highlights the capability of training a good TTS model using as little as one minute of voice data, a method known as “few shot voice cloning.” The project is under the MIT license and involves Python as its primary programming language.

Important: This tutorial has expired, and the project has supported MAC, please follow GitHub’s tutorial.

This tutorial will talk about how to running this project using the CPU under the Mac platform.


中文版本教程

  • Don’t think about trainning on Mac yet, It’s good enough if they can preprocess and infer. Running LLM might be possible, but if anyone has successfully trained on a Mac (with MPS), please let me know.
  • This tutorial mainly talks about the inference

Bert-Vits2 2.3 Chinese Extra for local inference on Intel or Apple Silicon Mac

2024年1月17日 10:00

Introduction

Bert-VITS2 is an innovative text-to-speech synthesis project that combines the VITS2 backbone with a multilingual BERT model. This integration allows for enhanced speech synthesis capabilities, especially in multilingual contexts. The project is particularly noteworthy for its specialized version, “Extra: 中文特化版本,” tailored for Chinese language processing. This development represents a significant advancement in the field of speech synthesis, catering to diverse linguistic needs.

This tutorial will talk about how to running this project using the CPU under the Mac platform.


This tutorial is based on videos and practice from this site.
Below are the reference videos and documents:

  1. How to elegantly create a Bert-VITS2 Dataset

    https://www.bilibili.com/video/BV1rj411v7w1

  2. [Bert-vits2] Cloud

SO-VITS-SVC 4.0 and 4.1 local inference on Intel/Apple Silicon Mac

2024年1月17日 09:00

Introduction

The SO-VITS-SVC project represents a cutting-edge initiative in the field of voice synthesis and conversion, specifically tailored for applications in singing voice transformation. Leveraging the capabilities of the Variational Inference with adversarial learning (VITS) models, this project offers a platform for users to convert spoken or sung audio into the voice of a different character or person.

Primarily targeted at enthusiasts in deep learning and voice synthesis, as well as researchers and hobbyists interested in voice manipulation and anime character voice generation, SO-VITS-SVC serves as a practical tool for applying theoretical knowledge in deep learning to real-world scenarios. The project enables users to experiment with various aspects of voice conversion, including timbre, pitch, and rhythm alterations.

This tutorial will talk about how to running this project using the CPU under the Mac

❌