百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 技术分类 > 正文

OpenAI Whisper本地CPU推理的四种方法

ztj100 2024-12-17 17:48 34 浏览 0 评论

在这篇文章中,我将总结我为 Whisper 自动语音识别模型运行推理的实验。

Whisper 是 OpenAI 推出的基于 Transformer 的开源 ASR 模型。在我的案例中,该模型是在患有言语障碍的人的语音记录数据集上进行微调的。

我尝试了以下在 CPU 上进行推理的选项:

  • HuggingFace 管道
  • ONNX 运行时
  • OpenVino 运行时
  • PyTorch 推理

所有这些方法(ONNX 除外)都在此 Git 存储库中实现。

TL;DR

以下是最终结果:

  • PyTorch(16 核)≈ 3.5 秒
  • OpenVino int4 ≈ 4.0 秒
  • OpenVino,int8 ≈ 4.2 秒
  • PyTorch(8 核)≈ 4.2 秒
  • PyTorch(4 核)≈ 8.0 秒
  • HF 管道 ≈ 18.0 秒

接下来,我将详细展示每个解决方案的实现。

1、使用 HuggingFace 管道进行 Whisper 推理

由于我们的模型是使用 transformers 库进行预训练并存储在 HuggingFace 集线器上的,因此第一个也是最直接的选择是使用内置管道。

这是通过 HF 管道进行 Whisper 推理的类。有很多教程可用,所以我不会在这里详细介绍:

class WhisperService:
    _initialized = False

    def __init__(self, language='en'):
        if not WhisperService._initialized:
            os.environ["TRANSFORMERS_VERBOSITY"] = "error"
            transformers_log.set_verbosity_error()
            self.model_name = utils.MODEL_NAME
            self.language = language
            self.task = utils.TASK

            try:
                # Initialize model and related components
                log.info("Starting Whisper service...")
                self.peft_config = self.generate_model_config()
                self.model = self.get_whisper_model_from_hf(self.peft_config)
                self.tokenizer = self.create_tokenizer(self.peft_config)
                self.processor = self.create_processor(self.peft_config)
                self.pipeline_asr, self.forced_decoder_ids = self.create_whisper_pipeline(
                    self.model, self.tokenizer, self.processor
                )
                WhisperService._initialized = True
                log.info("Whisper service started with success!")
            except Exception as e:
                log.error(f"Error during Whisper service init: {str(e)}")
                raise

    def generate_model_config(self) -> PeftConfig:
        """
        """
        try:
            login(token=os.environ['API_TOKEN'])
            config = PeftConfig.from_pretrained(self.model_name)
            log.info("Model config generated")
            return config
        except Exception as e:
            log.error(f"Error during model config generation: {str(e)}")
            raise

    def get_whisper_model_from_hf(self, peft_config: PeftConfig) -> PeftModel:
        """
        """
        try:
            model = WhisperForConditionalGeneration.from_pretrained(
                    peft_config.base_model_name_or_path
                )
            # Check if GPU is available
            if torch.cuda.is_available():
                log.info("Model loaded on GPU")
            else:
                log.info("Model loaded on CPU")

            model = PeftModel.from_pretrained(model, self.model_name)
            log.info("Whisper model configured with PeftModel")
            return model
        except Exception as e:
            log.error(f"Error during Whisper model loading: {str(e)}")
            raise

    def create_processor(self, peft_config: PeftConfig) -> WhisperProcessor:
        """
        """
        try:
            processor = WhisperProcessor.from_pretrained(
                peft_config.base_model_name_or_path,
                language=self.language,
                task=self.task
            )
            log.info("WhisperProcessor created")
            return processor
        except Exception as e:
            log.error(f"Error during WhisperProcessor creation: {str(e)}")
            raise

    def create_tokenizer(self, peft_config: PeftConfig) -> WhisperTokenizer:
        """
        """
        try:
            tokenizer = WhisperTokenizer.from_pretrained(
                peft_config.base_model_name_or_path,
                language=self.language,
                task=self.task
            )
            log.info("WhisperTokenizer created")
            return tokenizer
        except Exception as e:
            log.error(f"Error during WhisperTokenizer creation: {str(e)}")
            raise

    def create_whisper_pipeline(self, model: PreTrainedModel, tokenizer: WhisperTokenizer,
                                processor: WhisperProcessor) -> tuple:
        """
        """
        try:
            feature_extractor = processor.feature_extractor
            pipe_lora = AutomaticSpeechRecognitionPipeline(
                model=model,
                tokenizer=tokenizer,
                feature_extractor=feature_extractor
            )
            forced_decoder_ids = processor.get_decoder_prompt_ids(language=self.language, task=self.task)
            log.info("Pipeline created")
            return pipe_lora, forced_decoder_ids
        except Exception as e:
            log.error(f"Error during Pipeline creation: {str(e)}")
            raise

    async def transcribe(self, audio_path: str) -> str:
        """
        """
        try:
            loop = asyncio.get_event_loop()
            log.info(f"Transcribing the following file audio: {audio_path}")
            with torch.cuda.amp.autocast():
                text = await loop.run_in_executor(
                    None,
                    lambda:
                    self.pipeline_asr(audio_path, generate_kwargs={"forced_decoder_ids": self.forced_decoder_ids},
                                      max_new_tokens=255)["text"]
                )
            log.info("Transcription completed!")
            return text
        except Exception as e:
            log.error(f"Error during transcription: {str(e)}")
            raise

这里我们从 HuggingFace hub 获取模型( utils.MODEL_NAME 是 HF 模型标识符 — 例如“miosipof/asr_EN_medium_v1”)。

请注意,此模型是一个适配器,在 PEFT(参数高效微调)框架的帮助下进行训练。我们使用 generate_model_config() 函数来提取 PEFT 模型的配置。

管道通过以下代码建立:

pipe_lora = AutomaticSpeechRecognitionPipeline(
                model=model,
                tokenizer=tokenizer,
                feature_extractor=feature_extractor
            )

2、ONNX 运行时

2.1 模型格式转换

模型首先需要转换为 ONNX 格式。

让我们导入一些库:

from onnxruntime.quantization import quantize_dynamic, QuantType
import onnx
import numpy as np
import onnxruntime as ort
import torchaudio

接下来,我们将使用带有 CLI 的 transformers optimum 库将模型从 HuggingFace 转换为 ONNX 格式:

pip install optimum[exporters]
optimum-cli export onnx --model local_path --task trascribe local_model_folder/

这将从位于 local_path 的原始模型在 local_model_folder 中创建一堆文件。

让我们设置 ONNX 会话:

session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.execution_mode = ort.ExecutionMode.ORT_PARALLEL
session_options.intra_op_num_threads = 4
session_options.inter_op_num_threads = 16

我们将分别使用编码器和解码器:

sess_encoder = ort.InferenceSession("./path_to/encoder_q.onnx")
sess_decoder = ort.InferenceSession("./path_to/decoder_q.onnx")

为了提高性能,我们定义一个模型量化函数,然后将其应用于编码器和编码器:

def quantize_onnx_model(onnx_model_path, quantized_model_path):

    onnx_opt_model = onnx.load(onnx_model_path)
    quantize_dynamic(onnx_model_path,
    quantized_model_path,
    weight_type=QuantType.QUInt8) #chnage QInt8 to QUInt8

quantize_onnx_model("./path_to/encoder.onnx","./path_to/encoder_q.onnx")
quantize_onnx_model("./path_to/decoder.onnx","./path_to/decoder_q.onnx")

2.2 使用 ONNX 模型进行推理

让我们初始化处理器和标记器:

processor = WhisperProcessor.from_pretrained("./path_to/q_whisper_onnx")
# tokenizer = processor.tokenizer
tokenizer = whisper.decoding.get_tokenizer(
    model.is_multilingual, 
    task="transcribe", 
    language="en",
)

音频预处理脚本(类似于 Whisper log_mel_spectrogram() 函数)将 .wav 文件转换为 log-mel 频谱图数组:

def preprocessing_torchaudio(audio_path):
    waveform, sample_rate = torchaudio.load(audio_path)
    waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(waveform)
    mel = processor.feature_extractor(waveform[0], sampling_rate=16000).input_features
    return torch.tensor(mel, dtype=torch.float32)

样本 .wav 文件的音频数组 x_mel 将是:

x_mel = preprocessing_librosa("./path_to/audio.wav")

最后,使用我们的量化 ONNX 对序列进行编码和解码的自定义循环模型:

max_tokens = 448
out_encoder, = sess_encoder.run(["last_hidden_state"], {"input_features": x_mel.numpy()})

next_token = tokenizer.sot
# next_token = "<|startoftranscript|>"
while x_tokens.shape[1] <= max_tokens and next_token != tokenizer.eot:
    out_decoder, = sess_decoder.run(
        ["logits"], 
        {
            "input_ids": x_tokens.numpy(), 
            "encoder_hidden_states": out_encoder,
        },
    )
    next_token = out_decoder[0, -1].argmax()
    next_token = torch.tensor(next_token)
    
    print(next_token,next_token.shape,x_tokens.shape)
    
    x_tokens = torch.concat(
        [x_tokens, next_token.reshape(1, 1)], 
        axis=1,
    )


print(tokenizer.decode(x_tokens[0]))

我将代码保留为这种不愉快的格式,因为 ONNX 推理性能总是比通过 OpenVino 或 PyTorch 进行推理差得多,可能是因为 ONNX 格式最初是为卷积神经网络开发的,可能不是优化变换的最佳选择。

3、OpenVino 运行时

使用 OpenVino 进行推理的实现更加简单。

首先,一些必要的导入:

import os
from transformers import WhisperProcessor, logging as transformers_log
from optimum.intel.openvino import OVModelForSpeechSeq2Seq
import torchaudio
import torch
import numpy as np
import time

from src import log
from src.utils import utils

import asyncio

3.1 模型转换为 OpenVino 格式

我们将使用 transformers optimum 库将我们的 HuggingFace 模型导出为 OpenVino 格式(你可以将 openai/whisper-medium 替换为你自己的模型或托管在 HuggingFace hub 上的任何其他 Whisper 模型):

[openvino,nncf]optimum-cli export openvino --model openai/whisper-medium --weight-format int8 asr_openvino_int8

请注意,我们在导出时使用了 int8 量化。我也尝试过 int4 量化,但对我来说,它对转录质量影响很大。

以下是我们将用来获取 OpenVino 模型的方法

    def get_openvino_model(self):

        ov_config = {"CACHE_DIR": ""}
        self.model = OVModelForSpeechSeq2Seq.from_pretrained(self.ov_model_name, ov_config=ov_config, compile=False)
        log.info("OpenVino model loaded from " + str(self.ov_model_name))

        try
            ov_model_path = Path("src/model/" + self.model_name.replace("/", "_"))
            ov_config = {"CACHE_DIR": ""}
        
            if not ov_model_path.exists():
                self.model = OVModelForSpeechSeq2Seq.from_pretrained(
                    self.model_name,
                    ov_config=ov_config,
                    export=True,
                    compile=False,
                    load_in_8bit=False,
                )
                self.model.half()
                self.model.save_pretrained(ov_model_path)
                log.info("HF model converted to OpenVino and saved in " + str(ov_model_path))
            else:
                self.model = OVModelForSpeechSeq2Seq.from_pretrained(ov_model_path, ov_config=ov_config, compile=False)
                log.info("OpenVino model loaded from " + str(ov_model_path))
        
        except Exception as e:
            log.error(f"Error during OpenVino model loading: {str(e)}")
            raise

        return self.model

这里 self.ov_model_name 将是 asr_openvino_int8,我们之前用于最佳 CLI 命令(+ 它的路径)。我使用了一个丑陋的 self.model_name.replace("/", "_") 函数将 HuggingFace 上的 URL 转换为模型名称。

接下来,必须编译 OpenVino 模型,因为它将通过 OpenVino 运行时直接加载:

    def compile_openvino_model(self):
        """
        """
        try:

            if torch.cuda.is_available():
                log.info("Model loaded on GPU")
                self.device = "GPU"
            else:
                log.info("Model loaded on CPU")
                self.device = "CPU"

            self.model.to(self.device)
            self.model.compile()

            log.info("OpenVino model compiled successfully")

        except Exception as e:
            log.error(f"Error during OpenVino model compilation: {str(e)}")
            raise

        return self.model

3.2 使用 OpenVino 模型进行推理

现在,我们定义两个辅助函数来创建 Whisper 处理器进行编码(与前向传递相比,它所需的时间可以忽略不计)和音频预处理:

    def create_processor(self):
        """
        """
        try:
            processor = WhisperProcessor.from_pretrained(
                self.model_name,
                language=self.language,
                task=self.task
            )
            log.info("WhisperProcessor created")
            return processor
        except Exception as e:
            log.error(f"Error during WhisperProcessor creation: {str(e)}")
            raise


    def preprocess_audio(self, waveform):
        """
        """
        # compute log-Mel input features from input audio array
        audio_features = self.processor.feature_extractor(waveform, sampling_rate=self.sr).input_features[0]
        audio_features = torch.tensor(np.array([audio_features]))

        return audio_features

最后,管道定义一个用于转录的异步函数 — 类似于 HuggingFace 管道实现:

    def openvino_pipeline(self,audio_path):

        print("1 - starting audio load:", time.time())
        waveform, sample_rate = torchaudio.load(audio_path)
        waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=self.sr)(waveform)[0]
        print("2 - starting preprocessing:", time.time())
        audio_features = self.preprocess_audio(waveform)

        print("3 - starting forward pass:", time.time())
        predicted_ids = self.model.generate(audio_features, max_new_tokens=224)

        print("4 - starting decoding:", time.time())
        transcription = self.processor.batch_decode(predicted_ids, skip_special_tokens=True)

        return transcription[0]


    async def transcribe(self, audio_path: str) -> str:
        """
        """
        try:
            loop = asyncio.get_event_loop()
            log.info(f"Transcribing the following file audio: {audio_path}")

            print("0 - starting the loop:",time.time())
            text = await loop.run_in_executor(
                None,
                lambda: self.openvino_pipeline(audio_path)
                )

            print("5 - all done:", time.time())
            log.info("Transcription completed!")
            return text

        except Exception as e:
            log.error(f"Error during transcription: {str(e)}")
            raise

4、PyTorch 推理

通过直接 PyTorch 实现 Whisper 进行推理包括几个步骤:

  • 在我的例子中,用于推理的微调模型位于 HuggingFace Hub,所以我必须首先从那里获取它;
  • 我们还需要来自 OpenAI GitHub 的原始基础 Whisper 模型(大小应与我们的微调模型相对应 - 在我的情况下是 Whisper-Medium);
  • 来自 HF 的微调模型必须映射到 OpenAI 格式(请参阅此处的详细信息);
  • 我们的预训练权重将应用于基础模型;
  • 然后我们可以将模型设置为评估模式并运行推理。

让我们从 HuggingFace 中心获取模型开始:

   def get_hf_model(self):
        """
        """
        try:
            merged_model = WhisperForConditionalGeneration.from_pretrained(self.model_name)

            pt_model_name = os.path.basename(self.model_name) + ".pth"
            pt_dir_name = os.path.join("assets","pt_models")

            self.pretrained_model_path = os.path.join(pt_dir_name, pt_model_name)

            if not os.path.exists(pt_dir_name):
                os.makedirs(pt_dir_name)
                log.info(f"Directory {pt_dir_name} created and will be used to store PyTorch models")
            else:
                log.info(f"Directory {pt_dir_name} exists, using it to save PyTorch model")

            torch.save(merged_model.state_dict(), self.pretrained_model_path)
            log.info(f"HF model saved to {self.pretrained_model_path} in PyTorch format for conversion")

        except Exception as e:
            log.error(f"Error during HuggingFace model loading: {str(e)}")
            raise

这里 self.model_name 代表我在 HuggingFace 中的模型 id(请注意,它不应该是适配器,而是完整合并的模型)。

4.1 从 HuggingFace 到 PyTorch 的模型转换

Whisper 的 transformers 实现中使用的层名称与 OpenAI 原始 repo 中使用的层名称不同。我在这里写了一篇关于此的简短说明。

映射函数(从 HF 到 OpenAI)是这个:

    def map_hf_to_pt(self,pretrained_weights):

        def rename_key(key):
            new_key = key
            for k, v in self.mapping:
                new_key = new_key.replace(k, v)
            return new_key

        # Rename the keys in the state_dict
        updated_weights = {rename_key(k): v for k, v in pretrained_weights.items()}
        updated_weights.pop('proj_out.weight', None)

        return updated_weights

现在,将此映射应用于基本 Whisper 模型,并使用我们从 HuggingFace hub 下载的模型的预训练权重:

self.mapping = [ ('model.', ''),
           ('decoder.layers', 'decoder.blocks'), 
           ('encoder.layers', 'encoder.blocks'), 
           
           ('encoder.embed_positions.weight', 'encoder.positional_embedding'), 
           
           ('self_attn.k_proj', 'attn.key'),
           ('self_attn.q_proj', 'attn.query'),
           ('self_attn.v_proj', 'attn.value'),
           ('self_attn.out_proj', 'attn.out'),

           ('self_attn_layer_norm', 'attn_ln'),
           ('final_layer_norm', 'mlp_ln'),
           ('fc1', 'mlp.0'),
           ('fc2', 'mlp.2'),

           ('encoder_attn.k_proj','cross_attn.key'),
           ('encoder_attn.v_proj','cross_attn.value'),
           ('encoder_attn.q_proj','cross_attn.query'),
           ('encoder_attn.out_proj','cross_attn.out'),
           ('encoder_attn_layer_norm','cross_attn_ln'),

           ('decoder.embed_positions.weight','decoder.positional_embedding'),
           ('decoder.embed_tokens','decoder.token_embedding'),

           ('encoder.layer_norm','encoder.ln_post'),

           ('decoder.layer_norm','decoder.ln'),
          ]

4.2 使用 PyTorch 进行推理

我们几乎已全部设置完毕。定义 Whisper 处理器和编码函数:

    def create_processor(self):
        """
        """
        try:
            processor = WhisperProcessor.from_pretrained(
                self.model_name,
                language=self.language,
                task=self.task
            )
            log.info("WhisperProcessor created")
            return processor
        except Exception as e:
            log.error(f"Error during WhisperProcessor creation: {str(e)}")
            raise


    def preprocess_audio(self, waveform):
        """
        """
        # compute log-Mel input features from input audio array
        mel = self.processor.feature_extractor(waveform, sampling_rate=self.sr).input_features
        return torch.tensor(mel, dtype=torch.float32)

最后,管道和转录函数:

 def inference_pipeline(self,audio_path):

        log.info("1 - Starting audio load:")
        # waveform, sample_rate = librosa.load(audio_path, sr=self.sr)
        waveform, sample_rate = torchaudio.load(audio_path)
        waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=self.sr)(waveform)[0]

        log.info("2 - starting preprocessing:")
        audio_features = self.preprocess_audio(waveform)

        log.info("3 - Starting forward pass:")

        with torch.no_grad():
            result = whisper.decode(
                self.model,
                audio_features,
                options=whisper.DecodingOptions(
                    fp16=False,
                    language="it",
                    without_timestamps=True,
                    suppress_blank=False,
                    suppress_tokens=[],
                ),
            )

        return result[0].text


    async def transcribe(self, audio_path: str) -> DecodingResult | list[DecodingResult]:
        """
        """
        try:
            loop = asyncio.get_event_loop()
            log.info(f"Transcribing the following file audio: {audio_path}")
            log.info("Transcription started...")

            text = await loop.run_in_executor(
                None,
                lambda: self.inference_pipeline(audio_path)
                )

            log.info("Transcription completed!")

            return text

        except Exception as e:
            log.error(f"Error during transcription: {str(e)}")
            raise

以下是 PyTorch 推理类实现的完整代码。注意初始化期间的 torch.set_num_threads(num_threads) — 在这一行中,我们设置了用于推理的 CPU 核心数,这在很大程度上影响了性能:

import os

from src import log
from src.utils import utils

import asyncio
import whisper
from whisper import DecodingResult

from transformers import WhisperForConditionalGeneration, WhisperProcessor, logging as transformers_log
from huggingface_hub import hf_hub_download, login

import torch
import torchaudio
import torch.quantization

class InferenceService:
    _initialized = False

    def __init__(self, language='it', num_threads=1, quantization=True, device = "cpu"):
        try:
            login(token=os.environ['API_TOKEN'])
            log.info("HuggingFace login successful")
        except Exception as e:
            log.error(f"Error during HuggingFace login: {str(e)}")
            raise

        if not InferenceService._initialized:
            os.environ["TRANSFORMERS_VERBOSITY"] = "error"
            transformers_log.set_verbosity_error()
            self.model_name = utils.MERGED_MODEL_NAME
            self.language = language
            self.pytorch_converted_model_source = utils.PRETRAINED_MODEL_PTH
            self.pytorch_converted_model_filename = utils.PRETRAINED_MODEL_FILENAME
            self.task = utils.TASK
            self.device = device
            self.sr = utils.SAMPLING_RATE
            self.mapping = utils.HF_PT_MAPPING

            try:
                # Initialize model and related components
                log.info("Starting PyTorch Inference service...")

                try:
                    self.pretrained_model_path = hf_hub_download(repo_id=self.pytorch_converted_model_source,
                                                                 filename=self.pytorch_converted_model_filename)
                    log.info(f"Whisper pretrained model downloaded to {self.pretrained_model_path}")

                except Exception as e:
                    log.info(f"Unable to download the PyTorch model: {str(e)} - switching to model from HF for conversion")
                    self.get_hf_model()

                self.model = self.set_pt_model()

                if quantization:
                    self.model = torch.quantization.quantize_dynamic(self.model,
                                                                {torch.nn.Linear},
                                                                dtype=torch.qint8)

                self.model = self.model.cpu()
                self.processor = self.create_processor()

                InferenceService._initialized = True
                log.info("PyTorch Inference service started with success!")

            except Exception as e:
                log.error(f"Error during PyTorch Inference service init: {str(e)}")
                raise

        torch.set_num_threads(num_threads)
        log.info(f"Number of threads set to {num_threads} for PyTorch calculations")

    def get_hf_model(self):
        """
        """
        try:
            merged_model = WhisperForConditionalGeneration.from_pretrained(self.model_name)

            pt_model_name = os.path.basename(self.model_name) + ".pth"
            pt_dir_name = os.path.join("assets","pt_models")

            self.pretrained_model_path = os.path.join(pt_dir_name, pt_model_name)

            if not os.path.exists(pt_dir_name):
                os.makedirs(pt_dir_name)
                log.info(f"Directory {pt_dir_name} created and will be used to store PyTorch models")
            else:
                log.info(f"Directory {pt_dir_name} exists, using it to save PyTorch model")

            torch.save(merged_model.state_dict(), self.pretrained_model_path)
            log.info(f"HF model saved to {self.pretrained_model_path} in PyTorch format for conversion")

        except Exception as e:
            log.error(f"Error during HuggingFace model loading: {str(e)}")
            raise

        return 1

    def map_hf_to_pt(self,pretrained_weights):

        def rename_key(key):
            new_key = key
            for k, v in self.mapping:
                new_key = new_key.replace(k, v)
            return new_key

        # Rename the keys in the state_dict
        updated_weights = {rename_key(k): v for k, v in pretrained_weights.items()}
        updated_weights.pop('proj_out.weight', None)

        return updated_weights

    def set_pt_model(self):

        model = whisper.load_model("medium")
        log.info("Whisper base model loaded")

        pretrained_model = torch.load(self.pretrained_model_path)
        log.info(f"Whisper pretrained model loaded from {self.pretrained_model_path}")

        # Extract state_dict if the loaded model is not already a state_dict
        if hasattr(pretrained_model, "state_dict"):
            pretrained_weights = pretrained_model.state_dict()  # extract the state dict
        else:
            pretrained_weights = pretrained_model  # it's already a state_dict

        #######################################################################

        updated_weights = self.map_hf_to_pt(pretrained_weights)
        model.load_state_dict(updated_weights, strict=True)

        log.info(f"Model weights mapped from HuggingFace model to PyTorch")

        # Activate to save converted model and/or its weights
        # torch.save(model, 'src/model/whisper_pretrained_converted.pth')
        # torch.save(updated_weights, 'src/model/whisper_pretrained_converted_weights.pth')

        ######################################################################

        model.to(self.device)
        model.requires_grad_(False)
        model.eval()

        log.info("Whisper PyTorch model loaded on " + str(self.device))

        return model


    def create_processor(self):
        """
        """
        try:
            processor = WhisperProcessor.from_pretrained(
                self.model_name,
                language=self.language,
                task=self.task
            )
            log.info("WhisperProcessor created")
            return processor
        except Exception as e:
            log.error(f"Error during WhisperProcessor creation: {str(e)}")
            raise


    def preprocess_audio(self, waveform):
        """
        """
        # compute log-Mel input features from input audio array
        mel = self.processor.feature_extractor(waveform, sampling_rate=self.sr).input_features
        return torch.tensor(mel, dtype=torch.float32)


    def inference_pipeline(self,audio_path):

        log.info("1 - Starting audio load:")
        # waveform, sample_rate = librosa.load(audio_path, sr=self.sr)
        waveform, sample_rate = torchaudio.load(audio_path)
        waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=self.sr)(waveform)[0]

        log.info("2 - starting preprocessing:")
        audio_features = self.preprocess_audio(waveform)

        log.info("3 - Starting forward pass:")

        with torch.no_grad():
            result = whisper.decode(
                self.model,
                audio_features,
                options=whisper.DecodingOptions(
                    fp16=False,
                    language="it",
                    without_timestamps=True,
                    suppress_blank=False,
                    suppress_tokens=[],
                ),
            )

        return result[0].text


    async def transcribe(self, audio_path: str) -> DecodingResult | list[DecodingResult]:
        """
        """
        try:
            loop = asyncio.get_event_loop()
            log.info(f"Transcribing the following file audio: {audio_path}")
            log.info("Transcription started...")

            text = await loop.run_in_executor(
                None,
                lambda: self.inference_pipeline(audio_path)
                )

            log.info("Transcription completed!")

            return text

        except Exception as e:
            log.error(f"Error during transcription: {str(e)}")
            raise

原文链接:Whisper本地推理的4种方法 - 汇智网

相关推荐

sharding-jdbc实现`分库分表`与`读写分离`

一、前言本文将基于以下环境整合...

三分钟了解mysql中主键、外键、非空、唯一、默认约束是什么

在数据库中,数据表是数据库中最重要、最基本的操作对象,是数据存储的基本单位。数据表被定义为列的集合,数据在表中是按照行和列的格式来存储的。每一行代表一条唯一的记录,每一列代表记录中的一个域。...

MySQL8行级锁_mysql如何加行级锁

MySQL8行级锁版本:8.0.34基本概念...

mysql使用小技巧_mysql使用入门

1、MySQL中有许多很实用的函数,好好利用它们可以省去很多时间:group_concat()将取到的值用逗号连接,可以这么用:selectgroup_concat(distinctid)fr...

MySQL/MariaDB中如何支持全部的Unicode?

永远不要在MySQL中使用utf8,并且始终使用utf8mb4。utf8mb4介绍MySQL/MariaDB中,utf8字符集并不是对Unicode的真正实现,即不是真正的UTF-8编码,因...

聊聊 MySQL Server 可执行注释,你懂了吗?

前言MySQLServer当前支持如下3种注释风格:...

MySQL系列-源码编译安装(v5.7.34)

一、系统环境要求...

MySQL的锁就锁住我啦!与腾讯大佬的技术交谈,是我小看它了

对酒当歌,人生几何!朝朝暮暮,唯有己脱。苦苦寻觅找工作之间,殊不知今日之事乃我心之痛,难道是我不配拥有工作嘛。自面试后他所谓的等待都过去一段时日,可惜在下京东上的小金库都要见低啦。每每想到不由心中一...

MySQL字符问题_mysql中字符串的位置

中文写入乱码问题:我输入的中文编码是urf8的,建的库是urf8的,但是插入mysql总是乱码,一堆"???????????????????????"我用的是ibatis,终于找到原因了,我是这么解决...

深圳尚学堂:mysql基本sql语句大全(三)

数据开发-经典1.按姓氏笔画排序:Select*FromTableNameOrderByCustomerNameCollateChinese_PRC_Stroke_ci_as//从少...

MySQL进行行级锁的?一会next-key锁,一会间隙锁,一会记录锁?

大家好,是不是很多人都对MySQL加行级锁的规则搞的迷迷糊糊,一会是next-key锁,一会是间隙锁,一会又是记录锁。坦白说,确实还挺复杂的,但是好在我找点了点规律,也知道如何如何用命令分析加...

一文讲清怎么利用Python Django实现Excel数据表的导入导出功能

摘要:Python作为一门简单易学且功能强大的编程语言,广受程序员、数据分析师和AI工程师的青睐。本文系统讲解了如何使用Python的Django框架结合openpyxl库实现Excel...

用DataX实现两个MySQL实例间的数据同步

DataXDataX使用Java实现。如果可以实现数据库实例之间准实时的...

MySQL数据库知识_mysql数据库基础知识

MySQL是一种关系型数据库管理系统;那废话不多说,直接上自己以前学习整理文档:查看数据库命令:(1).查看存储过程状态:showprocedurestatus;(2).显示系统变量:show...

如何为MySQL中的JSON字段设置索引

背景MySQL在2015年中发布的5.7.8版本中首次引入了JSON数据类型。自此,它成了一种逃离严格列定义的方式,可以存储各种形状和大小的JSON文档,例如审计日志、配置信息、第三方数据包、用户自定...

取消回复欢迎 发表评论: