Wiki source for ReconnaissanceVocaleJulius
ReconnaissanceVocale > julius
http://en.wikipedia.org/wiki/Julius_Speech_Recognition_Engine
http://www.voxforge.org/home/submitspeech/linux/step-2 [en] de très bon conseils pour vérifier que votre micro fonctionne bien, en plus des [[ReconnaissanceVocaleEnregistrement verifications que j'ai faites]] (qui peuvent aussi vous être utiles pour diagnostiquer)
===Installation de Julius===
http://julius.sourceforge.jp/en_index.php?q=en/index.html voir le support/build-all.sh pour plus d'informations sur la compilation (attention, il effectue l'install dans build-bin/)
./configure --with-mictype=alsa --enable-julian
make
make install # en root, installe dans /usr/local/bin par défaut : julius, julian, adinrec, adintool
===Première utilisation de julius===
~- enregistre en échantillonnant à 48 kHz à partir du micro sans essayer de faire de reconnaissance vocale (nosegment)
$ adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai1_julius
%%loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
ZeroFrames: drop
remove DC: off
Recording: essai1_julius.wav (warning: be care of disk space!)
----
[start recording]
[essai1_julius.wav]..............................................[Interrupt]
essai1_julius.wav: 78000 samples (4.88 sec.)
%%
~- arg ça avait fonctionné, maintenant j'ai cette erreur :
%%loading FIR filters for down sampling...done
Warning: the rate 48000 Hz is not supported by your PCM hardware.
==> Using 96000 Hz instead.
Error: monoral recording not supported on this device/driver
Error: failed to ready input device to 48kHz%%
cela manque vraiment d'outils pour diagnostiquer la configuration pour l'enregistrement ! (pourtant audacity fonctionne dans ce cas)
ajout de perl-Jcode-2.06-1mdv2007.0.x86_64 (a priori ne sert que pour le japonais, pas gênant, enlève une erreur à la compil')
ne fonctionne pas trop :
arecord -f cdr -d 10 -t raw | adintool -nosegment -in stdin -48 -out file -filename essai2_julius
rec -c 2 -B -t cdr trim 0 2 -
-B : big endian
~- ah j'ai trouvé un moyen pour que cela fonctionne mieux
./configure ; make clean
support/build-all.sh # vérifier qu'il n'y a pas d'erreurs à la compilation
build-bin/adintool -nosegment -48 -lv 2500 -in mic -out file -filename essai2
%%loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
ZeroFrames: drop
remove DC: off
Recording: essai2.wav (warning: be care of disk space!)
----
[start recording]
[essai2.wav]..........................................[Interrupt]
essai2.wav: 74000 samples (4.62 sec.)%%
ne pas oublier le -48 sinon il y a beaucoup de parasites
~- réessayer avec ""--with-mictype=alsa"" : ok n'a pas l'air de fonctionner :/
===Premier lancement de julian===
julian -input mic -C julian.jconf
include config: julian.jconf
julian.jconf: wrong argument: -iwsp
Try `-help' for more information.
Terminated
=> utiliser une version multipath : ""support/build-all.sh --enable-multipath --enable-julian""
le son enregistré est franchement pas fort, peut-être regarder libsent/src/adin/adin_mic_linux_oss.c pour garder le canal R (au lieu du L, voir fonction adin_mic_read)
~- cela a l'air de fonctionner, j'utilise le passage par un fichier pour pouvoir comparer entre la sortie de julian et ce que j'avais enregistré adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai8_julius
%%[baud@benLapix julius-3.5.2-quickstart-linux]$ ../julius/build-bin/julian -C julian.jconf -input rawfile
include config: julian.jconf
###### check configurations
###### build up system
Reading in HMM definition...(ascii)...limit check passed
defined HMMs: 4382
logical names: 6075 in HMMList
base phones: 44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...1026 added as logical...done
reading [grammar/sample.dfa] and [grammar/sample.dict]...
Reading in dictionary...
23 words...done
Reading in DFA grammar...done
- Gram #0: read
[grammars]
# 0: [active ] 23 words, 6 categories, 7 nodes (new) "sample"
gram "sample" registered
- Grammar update check
Mapping dict item <-> DFA terminal (category)...done
- Gram #0: installed
Building HMM lexicon tree...........281 nodes
coordination check passed
done
now beam width = 200 (guess)
- update completed
[grammars]
# 0: [active ] 23 words, 6 categories, 7 nodes "sample"
Global: 23 words, 6 categories, 7 nodes
Generating addlog table...1953 kb...done
All init successfully done
###### initialize input device
------------- System Info begin -------------
Julian rev.3.5.3-multipath (fast)
Engine configuration:
- Base setup : fast
- Tunings : DFA, LibSndFile, IconvOutput
- Compiled by: gcc -g -O2
Continuous Speech Recognition Parser based on automaton grammar
Files:
hmmfilename=acoustic_model_files_build726/hmmdefs
hmmmapfilename=acoustic_model_files_build726/tiedlist
grammar #1:
dfa = grammar/sample.dfa
dict = grammar/sample.dict
Speech input source: file
Acoustic analysis condition:
parameter = MFCC_0_D_N_Z (25 dimension from 12 cepstrum)
sample frequency = 16000 Hz
sample period = 625 (100ns unit)
window size = 400 samples (25.0 ms)
frame shift = 160 samples (10.0 ms)
pre-emphasis = 0.97
# filterbank = 24
cepst. lifter = 22
raw energy = False
energy normalize = False
delta window = 2 frames (20.0 ms) around
hi freq. = OFF
lo freq. = OFF
zero mean frame = OFF
base setup from = Julius defaults
spectral subtraction = off
HMM Info:
4382 models, 2543 states, 2543 mixtures are defined
model type = context dependency handling ON
training parameter = MFCC_N_D_Z_0
vector length = 25
cov. matrix type = DIAGC
duration type = NULLD
mixture num = 1
max state num = 5
skippable models = sp (1 model(s))
Dictionary Info:
vocabulary size = 23 words, 73 models
average word len = 3.2 models, 9.5 states
maximum state num = 21 nodes per word
transparent words = not exist
words under class = not exist
Lexicon tree info:
total node num = 281
root node num = 22
leaf node num = 23
DFA grammar info:
7 nodes, 9 arcs, 6 terminal(category) symbols
category-pair matrix size is 4 bytes
Weights and words:
(-penalty1) IW penalty1 = +5.0
(-penalty2) IW penalty2 = +20.0
(-cmalpha)CM alpha coef = 0.050000
(-sp)shortpause HMM name= "sp" specified, "sp" applied (physical)
found sp category IDs =
inter-word short pause = on (append "sp" for each word tail)
sp transition penalty = -70.0
Search parameters:
1st pass decoding = batch with sentence CMN
1st pass method = 1-best approx. generating indexed trellis
(-b) trellis beam width = 200 (-1 or not specified - guessed)
(-n)search candidate num= 1
(-s) search stack size = 500
(-m) search overflow = after 2000 hypothesis poped
2nd pass method = searching sentence, generating N-best
(-b2) pass2 beam width = 200
(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)
(-sb)2nd scan beamthres = 200.0 (in logscore)
(-gprune)Gauss. pruning = safe
(-tmix) mixture thres = 2 / 0
(-n) search till = 1 candidates found
(-output) and output = 1 candidates out of above
IWCD handling:
1st pass: approximation (use max. prob. of same LC)
2nd pass: loose (apply when hypo. is popped and scanned)
all possible words will be expanded in 2nd pass
build_wchmm2() used
lcdset limited by word-pair constraint
output word confidence measure based on search-time scores
System I/O configuration:
speech input source = speech file
input filelist = (none, enter filenames from stdin)
sampling freq. = 16000 Hz required
threaded A/D-in = supported, off
zero frames stripping = on
silence cutting = off
remove DC offset = off
reject short input = off
short pause segmentation= off
result output to = tty (standard out)
output charset conv. = disabled
------------- System Info end -------------
------
### read waveform input
enter filename->essai8_julius.wav
input speechfile: essai8_julius.wav
file format: Microsoft WAV, Signed 16 bit PCM, file native endian, 16000 Hz, 1 channels
139000 samples (8.69 sec.)
### speech analysis (waveform -> MFCC)
length: 867 frames (8.67 sec.)
### Recognition: 1st pass (LR beam with word-pair grammar)
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
pass1_best: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
pass1_best_wordseq: 0 2 4 4 4 4 4 1
pass1_best_phonemeseq: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
pass1_best_score: -21726.998047
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=867
sentence1: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
wseq1: 0 2 4 4 4 4 4 1
phseq1: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
cmscore1: 1.000 0.999 0.599 0.952 0.842 0.851 0.590 1.000
score1: -21681.556641
65 generated, 65 pushed, 9 nodes popped in 867%%
===à faire===
~- voir pour la reconnaissance mono-locuteur en enregistrant un nombre limité de mots
~- utiliser grammaire http://sourceforge.net/projects/voxforge/ et modèles acoustique (un fichier de conf' est disponible)
~~-voir ce post notamment http://www.voxforge.org/home/dev/acousticmodels/linux/adapt/htkjulius/live-testing
~~-et celui pour [[http://www.voxforge.org/home/forums/message-boards/general-discussion/simon-dialog-manager-and-julian-speech-recognition Simon Dialog Manager]] concernant une [[http://sourceforge.net/projects/speech2text/ interface speech-to-text]]
~~- réutilisation grammaire sphinx3 pour le français http://www.dev.voxforge.org/projects/Main/wiki/OtherLanguages voir http://www-lium.univ-lemans.fr/speechtools/index.html (Acoustic Models, trigram and quadrigram Language Models)
----
CategoryLangFr CategoryHobby
http://en.wikipedia.org/wiki/Julius_Speech_Recognition_Engine
http://www.voxforge.org/home/submitspeech/linux/step-2 [en] de très bon conseils pour vérifier que votre micro fonctionne bien, en plus des [[ReconnaissanceVocaleEnregistrement verifications que j'ai faites]] (qui peuvent aussi vous être utiles pour diagnostiquer)
===Installation de Julius===
http://julius.sourceforge.jp/en_index.php?q=en/index.html voir le support/build-all.sh pour plus d'informations sur la compilation (attention, il effectue l'install dans build-bin/)
./configure --with-mictype=alsa --enable-julian
make
make install # en root, installe dans /usr/local/bin par défaut : julius, julian, adinrec, adintool
===Première utilisation de julius===
~- enregistre en échantillonnant à 48 kHz à partir du micro sans essayer de faire de reconnaissance vocale (nosegment)
$ adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai1_julius
%%loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
ZeroFrames: drop
remove DC: off
Recording: essai1_julius.wav (warning: be care of disk space!)
----
[start recording]
[essai1_julius.wav]..............................................[Interrupt]
essai1_julius.wav: 78000 samples (4.88 sec.)
%%
~- arg ça avait fonctionné, maintenant j'ai cette erreur :
%%loading FIR filters for down sampling...done
Warning: the rate 48000 Hz is not supported by your PCM hardware.
==> Using 96000 Hz instead.
Error: monoral recording not supported on this device/driver
Error: failed to ready input device to 48kHz%%
cela manque vraiment d'outils pour diagnostiquer la configuration pour l'enregistrement ! (pourtant audacity fonctionne dans ce cas)
ajout de perl-Jcode-2.06-1mdv2007.0.x86_64 (a priori ne sert que pour le japonais, pas gênant, enlève une erreur à la compil')
ne fonctionne pas trop :
arecord -f cdr -d 10 -t raw | adintool -nosegment -in stdin -48 -out file -filename essai2_julius
rec -c 2 -B -t cdr trim 0 2 -
-B : big endian
~- ah j'ai trouvé un moyen pour que cela fonctionne mieux
./configure ; make clean
support/build-all.sh # vérifier qu'il n'y a pas d'erreurs à la compilation
build-bin/adintool -nosegment -48 -lv 2500 -in mic -out file -filename essai2
%%loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
ZeroFrames: drop
remove DC: off
Recording: essai2.wav (warning: be care of disk space!)
----
[start recording]
[essai2.wav]..........................................[Interrupt]
essai2.wav: 74000 samples (4.62 sec.)%%
ne pas oublier le -48 sinon il y a beaucoup de parasites
~- réessayer avec ""--with-mictype=alsa"" : ok n'a pas l'air de fonctionner :/
===Premier lancement de julian===
julian -input mic -C julian.jconf
include config: julian.jconf
julian.jconf: wrong argument: -iwsp
Try `-help' for more information.
Terminated
=> utiliser une version multipath : ""support/build-all.sh --enable-multipath --enable-julian""
le son enregistré est franchement pas fort, peut-être regarder libsent/src/adin/adin_mic_linux_oss.c pour garder le canal R (au lieu du L, voir fonction adin_mic_read)
~- cela a l'air de fonctionner, j'utilise le passage par un fichier pour pouvoir comparer entre la sortie de julian et ce que j'avais enregistré adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai8_julius
%%[baud@benLapix julius-3.5.2-quickstart-linux]$ ../julius/build-bin/julian -C julian.jconf -input rawfile
include config: julian.jconf
###### check configurations
###### build up system
Reading in HMM definition...(ascii)...limit check passed
defined HMMs: 4382
logical names: 6075 in HMMList
base phones: 44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...1026 added as logical...done
reading [grammar/sample.dfa] and [grammar/sample.dict]...
Reading in dictionary...
23 words...done
Reading in DFA grammar...done
- Gram #0: read
[grammars]
# 0: [active ] 23 words, 6 categories, 7 nodes (new) "sample"
gram "sample" registered
- Grammar update check
Mapping dict item <-> DFA terminal (category)...done
- Gram #0: installed
Building HMM lexicon tree...........281 nodes
coordination check passed
done
now beam width = 200 (guess)
- update completed
[grammars]
# 0: [active ] 23 words, 6 categories, 7 nodes "sample"
Global: 23 words, 6 categories, 7 nodes
Generating addlog table...1953 kb...done
All init successfully done
###### initialize input device
------------- System Info begin -------------
Julian rev.3.5.3-multipath (fast)
Engine configuration:
- Base setup : fast
- Tunings : DFA, LibSndFile, IconvOutput
- Compiled by: gcc -g -O2
Continuous Speech Recognition Parser based on automaton grammar
Files:
hmmfilename=acoustic_model_files_build726/hmmdefs
hmmmapfilename=acoustic_model_files_build726/tiedlist
grammar #1:
dfa = grammar/sample.dfa
dict = grammar/sample.dict
Speech input source: file
Acoustic analysis condition:
parameter = MFCC_0_D_N_Z (25 dimension from 12 cepstrum)
sample frequency = 16000 Hz
sample period = 625 (100ns unit)
window size = 400 samples (25.0 ms)
frame shift = 160 samples (10.0 ms)
pre-emphasis = 0.97
# filterbank = 24
cepst. lifter = 22
raw energy = False
energy normalize = False
delta window = 2 frames (20.0 ms) around
hi freq. = OFF
lo freq. = OFF
zero mean frame = OFF
base setup from = Julius defaults
spectral subtraction = off
HMM Info:
4382 models, 2543 states, 2543 mixtures are defined
model type = context dependency handling ON
training parameter = MFCC_N_D_Z_0
vector length = 25
cov. matrix type = DIAGC
duration type = NULLD
mixture num = 1
max state num = 5
skippable models = sp (1 model(s))
Dictionary Info:
vocabulary size = 23 words, 73 models
average word len = 3.2 models, 9.5 states
maximum state num = 21 nodes per word
transparent words = not exist
words under class = not exist
Lexicon tree info:
total node num = 281
root node num = 22
leaf node num = 23
DFA grammar info:
7 nodes, 9 arcs, 6 terminal(category) symbols
category-pair matrix size is 4 bytes
Weights and words:
(-penalty1) IW penalty1 = +5.0
(-penalty2) IW penalty2 = +20.0
(-cmalpha)CM alpha coef = 0.050000
(-sp)shortpause HMM name= "sp" specified, "sp" applied (physical)
found sp category IDs =
inter-word short pause = on (append "sp" for each word tail)
sp transition penalty = -70.0
Search parameters:
1st pass decoding = batch with sentence CMN
1st pass method = 1-best approx. generating indexed trellis
(-b) trellis beam width = 200 (-1 or not specified - guessed)
(-n)search candidate num= 1
(-s) search stack size = 500
(-m) search overflow = after 2000 hypothesis poped
2nd pass method = searching sentence, generating N-best
(-b2) pass2 beam width = 200
(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)
(-sb)2nd scan beamthres = 200.0 (in logscore)
(-gprune)Gauss. pruning = safe
(-tmix) mixture thres = 2 / 0
(-n) search till = 1 candidates found
(-output) and output = 1 candidates out of above
IWCD handling:
1st pass: approximation (use max. prob. of same LC)
2nd pass: loose (apply when hypo. is popped and scanned)
all possible words will be expanded in 2nd pass
build_wchmm2() used
lcdset limited by word-pair constraint
output word confidence measure based on search-time scores
System I/O configuration:
speech input source = speech file
input filelist = (none, enter filenames from stdin)
sampling freq. = 16000 Hz required
threaded A/D-in = supported, off
zero frames stripping = on
silence cutting = off
remove DC offset = off
reject short input = off
short pause segmentation= off
result output to = tty (standard out)
output charset conv. = disabled
------------- System Info end -------------
------
### read waveform input
enter filename->essai8_julius.wav
input speechfile: essai8_julius.wav
file format: Microsoft WAV, Signed 16 bit PCM, file native endian, 16000 Hz, 1 channels
139000 samples (8.69 sec.)
### speech analysis (waveform -> MFCC)
length: 867 frames (8.67 sec.)
### Recognition: 1st pass (LR beam with word-pair grammar)
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
pass1_best: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
pass1_best_wordseq: 0 2 4 4 4 4 4 1
pass1_best_phonemeseq: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
pass1_best_score: -21726.998047
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=867
sentence1: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
wseq1: 0 2 4 4 4 4 4 1
phseq1: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
cmscore1: 1.000 0.999 0.599 0.952 0.842 0.851 0.590 1.000
score1: -21681.556641
65 generated, 65 pushed, 9 nodes popped in 867%%
===à faire===
~- voir pour la reconnaissance mono-locuteur en enregistrant un nombre limité de mots
~- utiliser grammaire http://sourceforge.net/projects/voxforge/ et modèles acoustique (un fichier de conf' est disponible)
~~-voir ce post notamment http://www.voxforge.org/home/dev/acousticmodels/linux/adapt/htkjulius/live-testing
~~-et celui pour [[http://www.voxforge.org/home/forums/message-boards/general-discussion/simon-dialog-manager-and-julian-speech-recognition Simon Dialog Manager]] concernant une [[http://sourceforge.net/projects/speech2text/ interface speech-to-text]]
~~- réutilisation grammaire sphinx3 pour le français http://www.dev.voxforge.org/projects/Main/wiki/OtherLanguages voir http://www-lium.univ-lemans.fr/speechtools/index.html (Acoustic Models, trigram and quadrigram Language Models)
----
CategoryLangFr CategoryHobby