http://en.wikipedia.org/wiki/Julius_Speech_Recognition_Engine
http://www.voxforge.org/home/submitspeech/linux/step-2 [en] de très bon conseils pour vérifier que votre micro fonctionne bien, en plus des verifications que j'ai faites (qui peuvent aussi vous être utiles pour diagnostiquer)
Installation de Julius
http://julius.sourceforge.jp/en_index.php?q=en/index.html voir le support/build-all.sh pour plus d'informations sur la compilation (attention, il effectue l'install dans build-bin/)./configure --with-mictype=alsa --enable-julian
make
make install # en root, installe dans /usr/local/bin par défaut : julius, julian, adinrec, adintool
Première utilisation de julius
- enregistre en échantillonnant à 48 kHz à partir du micro sans essayer de faire de reconnaissance vocale (nosegment)
loading FIR filters for down sampling...done Audio I/O Latency = 42 msec (data fragment = 2048 frames) AD-in thread created ---- Input-Source: Microphone Segmentation: OFF ZeroFrames: drop remove DC: off Recording: essai1_julius.wav (warning: be care of disk space!) ---- [start recording] [essai1_julius.wav]..............................................[Interrupt] essai1_julius.wav: 78000 samples (4.88 sec.)
- arg ça avait fonctionné, maintenant j'ai cette erreur :
loading FIR filters for down sampling...done Warning: the rate 48000 Hz is not supported by your PCM hardware. ==> Using 96000 Hz instead. Error: monoral recording not supported on this device/driver Error: failed to ready input device to 48kHz
cela manque vraiment d'outils pour diagnostiquer la configuration pour l'enregistrement ! (pourtant audacity fonctionne dans ce cas)
ajout de perl-Jcode-2.06-1mdv2007.0.x86_64 (a priori ne sert que pour le japonais, pas gênant, enlève une erreur à la compil')
ne fonctionne pas trop :
arecord -f cdr -d 10 -t raw | adintool -nosegment -in stdin -48 -out file -filename essai2_julius
rec -c 2 -B -t cdr trim 0 2 -
-B : big endian
- ah j'ai trouvé un moyen pour que cela fonctionne mieux
support/build-all.sh # vérifier qu'il n'y a pas d'erreurs à la compilation
build-bin/adintool -nosegment -48 -lv 2500 -in mic -out file -filename essai2
loading FIR filters for down sampling...done Audio I/O Latency = 42 msec (data fragment = 2048 frames) AD-in thread created ---- Input-Source: Microphone Segmentation: OFF ZeroFrames: drop remove DC: off Recording: essai2.wav (warning: be care of disk space!) ---- [start recording] [essai2.wav]..........................................[Interrupt] essai2.wav: 74000 samples (4.62 sec.)
ne pas oublier le -48 sinon il y a beaucoup de parasites
- réessayer avec --with-mictype=alsa : ok n'a pas l'air de fonctionner :/
Premier lancement de julian
julian -input mic -C julian.jconfinclude config: julian.jconf
julian.jconf: wrong argument: -iwsp
Try `-help' for more information.
Terminated
=> utiliser une version multipath : support/build-all.sh --enable-multipath --enable-julian
le son enregistré est franchement pas fort, peut-être regarder libsent/src/adin/adin_mic_linux_oss.c pour garder le canal R (au lieu du L, voir fonction adin_mic_read)
- cela a l'air de fonctionner, j'utilise le passage par un fichier pour pouvoir comparer entre la sortie de julian et ce que j'avais enregistré adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai8_julius
[baud@benLapix julius-3.5.2-quickstart-linux]$ ../julius/build-bin/julian -C julian.jconf -input rawfile include config: julian.jconf ###### check configurations ###### build up system Reading in HMM definition...(ascii)...limit check passed defined HMMs: 4382 logical names: 6075 in HMMList base phones: 44 used in logical done Making pseudo bi/mono-phone for IW-triphone...1026 added as logical...done reading [grammar/sample.dfa] and [grammar/sample.dict]... Reading in dictionary... 23 words...done Reading in DFA grammar...done - Gram #0: read [grammars] # 0: [active ] 23 words, 6 categories, 7 nodes (new) "sample" gram "sample" registered - Grammar update check Mapping dict item <-> DFA terminal (category)...done - Gram #0: installed Building HMM lexicon tree...........281 nodes coordination check passed done now beam width = 200 (guess) - update completed [grammars] # 0: [active ] 23 words, 6 categories, 7 nodes "sample" Global: 23 words, 6 categories, 7 nodes Generating addlog table...1953 kb...done All init successfully done ###### initialize input device ------------- System Info begin ------------- Julian rev.3.5.3-multipath (fast) Engine configuration: - Base setup : fast - Tunings : DFA, LibSndFile, IconvOutput - Compiled by: gcc -g -O2 Continuous Speech Recognition Parser based on automaton grammar Files: hmmfilename=acoustic_model_files_build726/hmmdefs hmmmapfilename=acoustic_model_files_build726/tiedlist grammar #1: dfa = grammar/sample.dfa dict = grammar/sample.dict Speech input source: file Acoustic analysis condition: parameter = MFCC_0_D_N_Z (25 dimension from 12 cepstrum) sample frequency = 16000 Hz sample period = 625 (100ns unit) window size = 400 samples (25.0 ms) frame shift = 160 samples (10.0 ms) pre-emphasis = 0.97 # filterbank = 24 cepst. lifter = 22 raw energy = False energy normalize = False delta window = 2 frames (20.0 ms) around hi freq. = OFF lo freq. = OFF zero mean frame = OFF base setup from = Julius defaults spectral subtraction = off HMM Info: 4382 models, 2543 states, 2543 mixtures are defined model type = context dependency handling ON training parameter = MFCC_N_D_Z_0 vector length = 25 cov. matrix type = DIAGC duration type = NULLD mixture num = 1 max state num = 5 skippable models = sp (1 model(s)) Dictionary Info: vocabulary size = 23 words, 73 models average word len = 3.2 models, 9.5 states maximum state num = 21 nodes per word transparent words = not exist words under class = not exist Lexicon tree info: total node num = 281 root node num = 22 leaf node num = 23 DFA grammar info: 7 nodes, 9 arcs, 6 terminal(category) symbols category-pair matrix size is 4 bytes Weights and words: (-penalty1) IW penalty1 = +5.0 (-penalty2) IW penalty2 = +20.0 (-cmalpha)CM alpha coef = 0.050000 (-sp)shortpause HMM name= "sp" specified, "sp" applied (physical) found sp category IDs = inter-word short pause = on (append "sp" for each word tail) sp transition penalty = -70.0 Search parameters: 1st pass decoding = batch with sentence CMN 1st pass method = 1-best approx. generating indexed trellis (-b) trellis beam width = 200 (-1 or not specified - guessed) (-n)search candidate num= 1 (-s) search stack size = 500 (-m) search overflow = after 2000 hypothesis poped 2nd pass method = searching sentence, generating N-best (-b2) pass2 beam width = 200 (-lookuprange)lookup range= 5 (tm-5 <= t <tm+5) (-sb)2nd scan beamthres = 200.0 (in logscore) (-gprune)Gauss. pruning = safe (-tmix) mixture thres = 2 / 0 (-n) search till = 1 candidates found (-output) and output = 1 candidates out of above IWCD handling: 1st pass: approximation (use max. prob. of same LC) 2nd pass: loose (apply when hypo. is popped and scanned) all possible words will be expanded in 2nd pass build_wchmm2() used lcdset limited by word-pair constraint output word confidence measure based on search-time scores System I/O configuration: speech input source = speech file input filelist = (none, enter filenames from stdin) sampling freq. = 16000 Hz required threaded A/D-in = supported, off zero frames stripping = on silence cutting = off remove DC offset = off reject short input = off short pause segmentation= off result output to = tty (standard out) output charset conv. = disabled ------------- System Info end ------------- ------ ### read waveform input enter filename->essai8_julius.wav input speechfile: essai8_julius.wav file format: Microsoft WAV, Signed 16 bit PCM, file native endian, 16000 Hz, 1 channels 139000 samples (8.69 sec.) ### speech analysis (waveform -> MFCC) length: 867 frames (8.67 sec.) ### Recognition: 1st pass (LR beam with word-pair grammar) .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... pass1_best: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s> pass1_best_wordseq: 0 2 4 4 4 4 4 1 pass1_best_phonemeseq: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil pass1_best_score: -21726.998047 ### Recognition: 2nd pass (RL heuristic best-first with DFA) samplenum=867 sentence1: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s> wseq1: 0 2 4 4 4 4 4 1 phseq1: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil cmscore1: 1.000 0.999 0.599 0.952 0.842 0.851 0.590 1.000 score1: -21681.556641 65 generated, 65 pushed, 9 nodes popped in 867
à faire
- voir pour la reconnaissance mono-locuteur en enregistrant un nombre limité de mots
- utiliser grammaire http://sourceforge.net/projects/voxforge/ et modèles acoustique (un fichier de conf' est disponible)
- voir ce post notamment http://www.voxforge.org/home/dev/acousticmodels/linux/adapt/htkjulius/live-testing
- et celui pour Simon Dialog Manager concernant une interface speech-to-text
- réutilisation grammaire sphinx3 pour le français http://www.dev.voxforge.org/projects/Main/wiki/OtherLanguages voir http://www-lium.univ-lemans.fr/speechtools/index.html (Acoustic Models, trigram and quadrigram Language Models)
CategoryLangFr CategoryHobby