<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
對嬰兒來說,啼哭聲是一種通訊的方式,一個非常有限的,但類似成年人進行交流的方式。它也是一種生物報警器,向外界傳達著嬰兒生理和心理的需求。基於啼哭聲聲波攜帶的資訊,嬰兒的身體狀況才能被確定,疾病才能被檢測出來。因此,有效辨識啼哭聲,成功地將嬰兒啼哭聲“翻譯”成“成人語言”,讓我們能夠讀懂啼哭聲的含義,有重大的實際意義。
A:awake(甦醒)
B:diaper(換尿布)
C:hug(要抱抱)
D:hungry(飢餓)
E:sleepy(睏乏)
F:uncomfortable(不舒服)
# 環境準備:安裝paddlespeech和paddleaudio !python -m pip install -q -U pip --user !pip install paddlespeech paddleaudio -U -q
!pip list|grep paddle
import warnings warnings.filterwarnings("ignore") import IPython import numpy as np import matplotlib.pyplot as plt import paddle %matplotlib inline
# !unzip -qoa data/data41960/dddd.zip
from paddleaudio import load data, sr = load(file='train/awake/awake_0.wav', mono=True, dtype='float32') # 單通道,float32音訊樣本點 print('wav shape: {}'.format(data.shape)) print('sample rate: {}'.format(sr)) # 展示音訊波形 plt.figure() plt.plot(data) plt.show()
from paddleaudio import load data, sr = load(file='train/diaper/diaper_0.wav', mono=True, dtype='float32') # 單通道,float32音訊樣本點 print('wav shape: {}'.format(data.shape)) print('sample rate: {}'.format(sr)) # 展示音訊波形 plt.figure() plt.plot(data) plt.show()
!paddlespeech cls --input train/awake/awake_0.wav
!paddlespeech help
# 查音訊長度 import contextlib import wave def get_sound_len(file_path): with contextlib.closing(wave.open(file_path, 'r')) as f: frames = f.getnframes() rate = f.getframerate() wav_length = frames / float(rate) return wav_length
# 編譯wav檔案 import glob sound_files=glob.glob('train/*/*.wav') print(sound_files[0]) print(len(sound_files))
# 統計最長、最短音訊 sounds_len=[] for sound in sound_files: sounds_len.append(get_sound_len(sound)) print("音訊最大長度:",max(sounds_len),"秒") print("音訊最小長度:",min(sounds_len),"秒")
!cp train/hungry/hungry_0.wav ~/
!pip install pydub -q
# 音訊資訊檢視 import math import soundfile as sf import numpy as np import librosa data, samplerate = sf.read('hungry_0.wav') channels = len(data.shape) length_s = len(data)/float(samplerate) format_rate=16000 print(f"channels: {channels}") print(f"length_s: {length_s}") print(f"samplerate: {samplerate}")
# 統一到34s from pydub import AudioSegment audio = AudioSegment.from_wav('hungry_0.wav') print(str(audio.duration_seconds)) i = 1 padded = audio while padded.duration_seconds * 1000 < 34000: padded = audio * i i = i + 1 padded[0:34000].set_frame_rate(16000).export('padded-file.wav', format='wav')
import math import soundfile as sf import numpy as np import librosa data, samplerate = sf.read('padded-file.wav') channels = len(data.shape) length_s = len(data)/float(samplerate) format_rate=16000 print(f"channels: {channels}") print(f"length_s: {length_s}") print(f"samplerate: {samplerate}")
# 定義函數,如未達到最大長度,則重複填充,最終從超過34s的音訊中擷取 from pydub import AudioSegment def convert_sound_len(filename): audio = AudioSegment.from_wav(filename) i = 1 padded = audio*i while padded.duration_seconds * 1000 < 34000: i = i + 1 padded = audio * i padded[0:34000].set_frame_rate(16000).export(filename, format='wav')
# 統一所有音訊到定長 for sound in sound_files: convert_sound_len(sound)
import os from paddlespeech.audio.datasets.dataset import AudioClassificationDataset class CustomDataset(AudioClassificationDataset): # List all the class labels label_list = [ 'awake', 'diaper', 'hug', 'hungry', 'sleepy', 'uncomfortable' ] train_data_dir='./train/' def __init__(self, **kwargs): files, labels = self._get_data() super(CustomDataset, self).__init__( files=files, labels=labels, feat_type='raw', **kwargs) # 返回音訊檔、label值 def _get_data(self): ''' This method offer information of wave files and labels. ''' files = [] labels = [] for i in range(len(self.label_list)): single_class_path=os.path.join(self.train_data_dir, self.label_list[i]) for sound in os.listdir(single_class_path): # print(sound) if 'wav' in sound: sound=os.path.join(single_class_path, sound) files.append(sound) labels.append(i) return files, labels
# 定義dataloader import paddle from paddlespeech.audio.features import LogMelSpectrogram # Feature config should be align with pretrained model sample_rate = 16000 feat_conf = { 'sr': sample_rate, 'n_fft': 1024, 'hop_length': 320, 'window': 'hann', 'win_length': 1024, 'f_min': 50.0, 'f_max': 14000.0, 'n_mels': 64, } train_ds = CustomDataset(sample_rate=sample_rate) feature_extractor = LogMelSpectrogram(**feat_conf) train_sampler = paddle.io.DistributedBatchSampler( train_ds, batch_size=64, shuffle=True, drop_last=False) train_loader = paddle.io.DataLoader( train_ds, batch_sampler=train_sampler, return_list=True, use_buffer_reader=True)
選取cnn14作為 backbone,用於提取音訊的特徵:
from paddlespeech.cls.models import cnn14 backbone = cnn14(pretrained=True, extract_embedding=True)
SoundClassifer接收cnn14作為backbone模型,並建立下游的分類網路:
import paddle.nn as nn class SoundClassifier(nn.Layer): def __init__(self, backbone, num_class, dropout=0.1): super().__init__() self.backbone = backbone self.dropout = nn.Dropout(dropout) self.fc = nn.Linear(self.backbone.emb_size, num_class) def forward(self, x): x = x.unsqueeze(1) x = self.backbone(x) x = self.dropout(x) logits = self.fc(x) return logits model = SoundClassifier(backbone, num_class=len(train_ds.label_list))
# 定義優化器和 Loss optimizer = paddle.optimizer.Adam(learning_rate=1e-4, parameters=model.parameters()) criterion = paddle.nn.loss.CrossEntropyLoss()
from paddleaudio.utils import logger epochs = 20 steps_per_epoch = len(train_loader) log_freq = 10 eval_freq = 10 for epoch in range(1, epochs + 1): model.train() avg_loss = 0 num_corrects = 0 num_samples = 0 for batch_idx, batch in enumerate(train_loader): waveforms, labels = batch feats = feature_extractor(waveforms) feats = paddle.transpose(feats, [0, 2, 1]) # [B, N, T] -> [B, T, N] logits = model(feats) loss = criterion(logits, labels) loss.backward() optimizer.step() if isinstance(optimizer._learning_rate, paddle.optimizer.lr.LRScheduler): optimizer._learning_rate.step() optimizer.clear_grad() # Calculate loss avg_loss += loss.numpy()[0] # Calculate metrics preds = paddle.argmax(logits, axis=1) num_corrects += (preds == labels).numpy().sum() num_samples += feats.shape[0] if (batch_idx + 1) % log_freq == 0: lr = optimizer.get_lr() avg_loss /= log_freq avg_acc = num_corrects / num_samples print_msg = 'Epoch={}/{}, Step={}/{}'.format( epoch, epochs, batch_idx + 1, steps_per_epoch) print_msg += ' loss={:.4f}'.format(avg_loss) print_msg += ' acc={:.4f}'.format(avg_acc) print_msg += ' lr={:.6f}'.format(lr) logger.train(print_msg) avg_loss = 0 num_corrects = 0 num_samples = 0
[2022-08-24 02:20:49,381] [ TRAIN] - Epoch=17/20, Step=10/15 loss=1.3319 acc=0.4875 lr=0.000100
[2022-08-24 02:21:08,107] [ TRAIN] - Epoch=18/20, Step=10/15 loss=1.3222 acc=0.4719 lr=0.000100
[2022-08-24 02:21:08,107] [ TRAIN] - Epoch=18/20, Step=10/15 loss=1.3222 acc=0.4719 lr=0.000100
[2022-08-24 02:21:26,884] [ TRAIN] - Epoch=19/20, Step=10/15 loss=1.2539 acc=0.5125 lr=0.000100
[2022-08-24 02:21:26,884] [ TRAIN] - Epoch=19/20, Step=10/15 loss=1.2539 acc=0.5125 lr=0.000100
[2022-08-24 02:21:45,579] [ TRAIN] - Epoch=20/20, Step=10/15 loss=1.2021 acc=0.5281 lr=0.000100
[2022-08-24 02:21:45,579] [ TRAIN] - Epoch=20/20, Step=10/15 loss=1.2021 acc=0.5281 lr=0.000100
top_k = 3 wav_file = 'test/test_0.wav' n_fft = 1024 win_length = 1024 hop_length = 320 f_min=50.0 f_max=16000.0 waveform, sr = load(wav_file, sr=sr) feature_extractor = LogMelSpectrogram( sr=sr, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window='hann', f_min=f_min, f_max=f_max, n_mels=64) feats = feature_extractor(paddle.to_tensor(paddle.to_tensor(waveform).unsqueeze(0))) feats = paddle.transpose(feats, [0, 2, 1]) # [B, N, T] -> [B, T, N] logits = model(feats) probs = nn.functional.softmax(logits, axis=1).numpy() sorted_indices = probs[0].argsort() msg = f'[{wav_file}]n' for idx in sorted_indices[-1:-top_k-1:-1]: msg += f'{train_ds.label_list[idx]}: {probs[0][idx]:.5f}n' print(msg)
[test/test_0.wav]
diaper: 0.50155
sleepy: 0.41397
hug: 0.05912
以上就是python PaddleSpeech實現嬰兒啼哭識別的詳細內容,更多關於python PaddleSpeech嬰兒啼哭識別的資料請關注it145.com其它相關文章!
相關文章
<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
综合看Anker超能充系列的性价比很高,并且与不仅和iPhone12/苹果<em>Mac</em>Book很配,而且适合多设备充电需求的日常使用或差旅场景,不管是安卓还是Switch同样也能用得上它,希望这次分享能给准备购入充电器的小伙伴们有所
2021-06-01 09:31:42
除了L4WUDU与吴亦凡已经多次共事,成为了明面上的厂牌成员,吴亦凡还曾带领20XXCLUB全队参加2020年的一场音乐节,这也是20XXCLUB首次全员合照,王嗣尧Turbo、陈彦希Regi、<em>Mac</em> Ova Seas、林渝植等人全部出场。然而让
2021-06-01 09:31:34
目前应用IPFS的机构:1 谷歌<em>浏览器</em>支持IPFS分布式协议 2 万维网 (历史档案博物馆)数据库 3 火狐<em>浏览器</em>支持 IPFS分布式协议 4 EOS 等数字货币数据存储 5 美国国会图书馆,历史资料永久保存在 IPFS 6 加
2021-06-01 09:31:24
开拓者的车机是兼容苹果和<em>安卓</em>,虽然我不怎么用,但确实兼顾了我家人的很多需求:副驾的门板还配有解锁开关,有的时候老婆开车,下车的时候偶尔会忘记解锁,我在副驾驶可以自己开门:第二排设计很好,不仅配置了一个很大的
2021-06-01 09:30:48
不仅是<em>安卓</em>手机,苹果手机的降价力度也是前所未有了,iPhone12也“跳水价”了,发布价是6799元,如今已经跌至5308元,降价幅度超过1400元,最新定价确认了。iPhone12是苹果首款5G手机,同时也是全球首款5nm芯片的智能机,它
2021-06-01 09:30:45