kaldifeat.FbankOptions¶
If you want to construct an instance of kaldifeat.Fbank or kaldifeat.OnlineFbank, you have to provide an instance of kaldifeat.FbankOptions.
The following code shows how to construct an instance of kaldifeat.FbankOptions.
$ python3
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import kaldifeat
>>> opts = kaldifeat.FbankOptions()
>>> print(opts)
frame_opts:
samp_freq: 16000
frame_shift_ms: 10
frame_length_ms: 25
dither: 1
preemph_coeff: 0.97
remove_dc_offset: 1
window_type: povey
round_to_power_of_two: 1
blackman_coeff: 0.42
snip_edges: 1
max_feature_vectors: -1
mel_opts:
num_bins: 23
low_freq: 20
high_freq: 0
vtln_low: 100
vtln_high: -500
debug_mel: 0
htk_mode: 0
use_energy: 0
energy_floor: 0
raw_energy: 1
htk_compat: 0
use_log_fbank: 1
use_power: 1
device: cpu
>>> print(opts.dither)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: '_kaldifeat.FbankOptions' object has no attribute 'dither'
>>>
>>> print(opts.frame_opts.dither)
1.0
>>> opts.frame_opts.dither = 0 # disable dither
>>> print(opts.frame_opts.dither)
0.0
>>> import torch
>>> print(opts.device)
cpu
>>> opts.device = 'cuda:0'
>>> print(opts.device)
cuda:0
>>> opts.device = torch.device('cuda', 1)
>>> print(opts.device)
cuda:1
>>> opts.device = 'cpu'
>>> print(opts.device)
cpu
>>> print(opts.mel_opts.num_bins)
23
>>> opts.mel_opts.num_bins = 80
>>> print(opts.mel_opts.num_bins)
80
Note that we reuse the same option name with compute-fbank-feats from Kaldi:
$ compute-fbank-feats --help
compute-fbank-feats
Create Mel-filter bank (FBANK) feature files.
Usage: compute-fbank-feats [options...] <wav-rspecifier> <feats-wspecifier>
Options:
--allow-downsample : If true, allow the input waveform to have a higher frequency than the specified --sample-frequency (and we'll downsample). (bool, default = false)
--allow-upsample : If true, allow the input waveform to have a lower frequency than the specified --sample-frequency (and we'll upsample). (bool, default = false)
--blackman-coeff : Constant coefficient for generalized Blackman window. (float, default = 0.42)
--channel : Channel to extract (-1 -> expect mono, 0 -> left, 1 -> right) (int, default = -1)
--debug-mel : Print out debugging information for mel bin computation (bool, default = false)
--dither : Dithering constant (0.0 means no dither). If you turn this off, you should set the --energy-floor option, e.g. to 1.0 or 0.1 (float, default = 1)
--energy-floor : Floor on energy (absolute, not relative) in FBANK computation. Only makes a difference if --use-energy=true; only necessary if --dither=0.0. Suggested values: 0.1 or 1.0 (float, default = 0)
--frame-length : Frame length in milliseconds (float, default = 25)
--frame-shift : Frame shift in milliseconds (float, default = 10)
--high-freq : High cutoff frequency for mel bins (if <= 0, offset from Nyquist) (float, default = 0)
--htk-compat : If true, put energy last. Warning: not sufficient to get HTK compatible features (need to change other parameters). (bool, default = false)
--low-freq : Low cutoff frequency for mel bins (float, default = 20)
--max-feature-vectors : Memory optimization. If larger than 0, periodically remove feature vectors so that only this number of the latest feature vectors is retained. (int, default = -1)
--min-duration : Minimum duration of segments to process (in seconds). (float, default = 0)
--num-mel-bins : Number of triangular mel-frequency bins (int, default = 23)
--output-format : Format of the output files [kaldi, htk] (string, default = "kaldi")
--preemphasis-coefficient : Coefficient for use in signal preemphasis (float, default = 0.97)
--raw-energy : If true, compute energy before preemphasis and windowing (bool, default = true)
--remove-dc-offset : Subtract mean from waveform on each frame (bool, default = true)
--round-to-power-of-two : If true, round window size to power of two by zero-padding input to FFT. (bool, default = true)
--sample-frequency : Waveform data sample frequency (must match the waveform file, if specified there) (float, default = 16000)
--snip-edges : If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. (bool, default = true)
--subtract-mean : Subtract mean of each feature file [CMS]; not recommended to do it this way. (bool, default = false)
--use-energy : Add an extra dimension with energy to the FBANK output. (bool, default = false)
--use-log-fbank : If true, produce log-filterbank, else produce linear. (bool, default = true)
--use-power : If true, use power, else use magnitude. (bool, default = true)
--utt2spk : Utterance to speaker-id map (if doing VTLN and you have warps per speaker) (string, default = "")
--vtln-high : High inflection point in piecewise linear VTLN warping function (if negative, offset from high-mel-freq (float, default = -500)
--vtln-low : Low inflection point in piecewise linear VTLN warping function (float, default = 100)
--vtln-map : Map from utterance or speaker-id to vtln warp factor (rspecifier) (string, default = "")
--vtln-warp : Vtln warp factor (only applicable if vtln-map not specified) (float, default = 1)
--window-type : Type of window ("hamming"|"hanning"|"povey"|"rectangular"|"sine"|"blackmann") (string, default = "povey")
--write-utt2dur : Wspecifier to write duration of each utterance in seconds, e.g. 'ark,t:utt2dur'. (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
Please refer to the output of compute-fbank-feats --help
for the meaning
of each field of kaldifeat.FbankOptions.
One thing worth noting is that kaldifeat.FbankOptions has a field device
,
which is an instance of torch.device
. You can assign it either a string, e.g.,
"cpu"
or "cuda:0"
, or an instance of torch.device
, e.g., torch.device("cpu")
or
torch.device("cuda", 1)
.
Hint
You can use this field to control whether the feature computer constructed from it performs computation on CPU or CUDA.
Caution
If you use a CUDA device, make sure that you have installed a CUDA version of PyTorch.
Example usage¶
The following code from https://github.com/csukuangfj/kaldifeat/blob/master/kaldifeat/python/tests/test_fbank_options.py demonstrate the usage of kaldifeat.FbankOptions:
#!/usr/bin/env python3
#
# Copyright (c) 2021 Xiaomi Corporation (authors: Fangjun Kuang)
import pickle
import torch
import kaldifeat
def test_default():
opts = kaldifeat.FbankOptions()
print(opts)
assert opts.frame_opts.samp_freq == 16000
assert opts.frame_opts.frame_shift_ms == 10.0
assert opts.frame_opts.frame_length_ms == 25.0
assert opts.frame_opts.dither == 1.0
assert abs(opts.frame_opts.preemph_coeff - 0.97) < 1e-6
assert opts.frame_opts.remove_dc_offset is True
assert opts.frame_opts.window_type == "povey"
assert opts.frame_opts.round_to_power_of_two is True
assert abs(opts.frame_opts.blackman_coeff - 0.42) < 1e-6
assert opts.frame_opts.snip_edges is True
assert opts.mel_opts.num_bins == 23
assert opts.mel_opts.low_freq == 20
assert opts.mel_opts.high_freq == 0
assert opts.mel_opts.vtln_low == 100
assert opts.mel_opts.vtln_high == -500
assert opts.mel_opts.debug_mel is False
assert opts.mel_opts.htk_mode is False
assert opts.use_energy is False
assert opts.energy_floor == 0.0
assert opts.raw_energy is True
assert opts.htk_compat is False
assert opts.use_log_fbank is True
assert opts.use_power is True
assert opts.device.type == "cpu"
def test_set_get():
opts = kaldifeat.FbankOptions()
opts.use_energy = True
assert opts.use_energy is True
opts.energy_floor = 1
assert opts.energy_floor == 1
opts.raw_energy = False
assert opts.raw_energy is False
opts.htk_compat = True
assert opts.htk_compat is True
opts.use_log_fbank = False
assert opts.use_log_fbank is False
opts.use_power = False
assert opts.use_power is False
opts.device = torch.device("cuda", 1)
assert opts.device.type == "cuda"
assert opts.device.index == 1
def test_set_get_frame_opts():
opts = kaldifeat.FbankOptions()
opts.frame_opts.samp_freq = 44100
assert opts.frame_opts.samp_freq == 44100
opts.frame_opts.frame_shift_ms = 20.5
assert opts.frame_opts.frame_shift_ms == 20.5
opts.frame_opts.frame_length_ms = 1
assert opts.frame_opts.frame_length_ms == 1
opts.frame_opts.dither = 0.5
assert opts.frame_opts.dither == 0.5
opts.frame_opts.preemph_coeff = 0.25
assert opts.frame_opts.preemph_coeff == 0.25
opts.frame_opts.remove_dc_offset = False
assert opts.frame_opts.remove_dc_offset is False
opts.frame_opts.window_type = "hanning"
assert opts.frame_opts.window_type == "hanning"
opts.frame_opts.round_to_power_of_two = False
assert opts.frame_opts.round_to_power_of_two is False
opts.frame_opts.blackman_coeff = 0.25
assert opts.frame_opts.blackman_coeff == 0.25
opts.frame_opts.snip_edges = False
assert opts.frame_opts.snip_edges is False
def test_set_get_mel_opts():
opts = kaldifeat.FbankOptions()
opts.mel_opts.num_bins = 100
assert opts.mel_opts.num_bins == 100
opts.mel_opts.low_freq = 22
assert opts.mel_opts.low_freq == 22
opts.mel_opts.high_freq = 1
assert opts.mel_opts.high_freq == 1
opts.mel_opts.vtln_low = 101
assert opts.mel_opts.vtln_low == 101
opts.mel_opts.vtln_high = -100
assert opts.mel_opts.vtln_high == -100
opts.mel_opts.debug_mel = True
assert opts.mel_opts.debug_mel is True
opts.mel_opts.htk_mode = True
assert opts.mel_opts.htk_mode is True
def test_from_empty_dict():
opts = kaldifeat.FbankOptions.from_dict({})
opts2 = kaldifeat.FbankOptions()
assert str(opts) == str(opts2)
def test_from_dict_partial():
d = {
"energy_floor": 10.5,
"htk_compat": True,
"mel_opts": {"num_bins": 80, "vtln_low": 1},
"frame_opts": {"window_type": "hanning"},
}
opts = kaldifeat.FbankOptions.from_dict(d)
assert opts.energy_floor == 10.5
assert opts.htk_compat is True
assert opts.mel_opts.num_bins == 80
assert opts.mel_opts.vtln_low == 1
assert opts.frame_opts.window_type == "hanning"
mel_opts = kaldifeat.MelBanksOptions.from_dict(d["mel_opts"])
assert str(opts.mel_opts) == str(mel_opts)
def test_from_dict_full_and_as_dict():
opts = kaldifeat.FbankOptions()
opts.htk_compat = True
opts.mel_opts.num_bins = 80
opts.frame_opts.samp_freq = 10
d = opts.as_dict()
assert d["htk_compat"] is True
assert d["mel_opts"]["num_bins"] == 80
assert d["frame_opts"]["samp_freq"] == 10
mel_opts = kaldifeat.MelBanksOptions()
mel_opts.num_bins = 80
assert d["mel_opts"] == mel_opts.as_dict()
frame_opts = kaldifeat.FrameExtractionOptions()
frame_opts.samp_freq = 10
assert d["frame_opts"] == frame_opts.as_dict()
opts2 = kaldifeat.FbankOptions.from_dict(d)
assert str(opts2) == str(opts)
d["htk_compat"] = False
d["device"] = torch.device("cuda", 2)
opts3 = kaldifeat.FbankOptions.from_dict(d)
assert opts3.htk_compat is False
assert opts3.device == torch.device("cuda", 2)
def test_pickle():
opts = kaldifeat.FbankOptions()
opts.use_energy = True
opts.use_power = False
opts.device = torch.device("cuda", 1)
opts.frame_opts.samp_freq = 44100
opts.mel_opts.num_bins = 100
data = pickle.dumps(opts)
opts2 = pickle.loads(data)
assert str(opts) == str(opts2)
def main():
test_default()
test_set_get()
test_set_get_frame_opts()
test_set_get_mel_opts()
test_from_empty_dict()
test_from_dict_partial()
test_from_dict_full_and_as_dict()
test_pickle()
if __name__ == "__main__":
main()