[en] de très bon conseils pour vérifier que votre micro fonctionne bien, en plus des verifications que j'ai faites (qui peuvent aussi vous être utiles pour diagnostiquer)
Installation de Julius voir le support/ pour plus d'informations sur la compilation (attention, il effectue l'install dans build-bin/)./configure --with-mictype=alsa --enable-julian
make install # en root, installe dans /usr/local/bin par défaut : julius, julian, adinrec, adintool
Première utilisation de julius
- enregistre en échantillonnant à 48 kHz à partir du micro sans essayer de faire de reconnaissance vocale (nosegment)
loading FIR filters for down sampling...done Audio I/O Latency = 42 msec (data fragment = 2048 frames) AD-in thread created ---- Input-Source: Microphone Segmentation: OFF ZeroFrames: drop remove DC: off Recording: essai1_julius.wav (warning: be care of disk space!) ---- [start recording] [essai1_julius.wav]..............................................[Interrupt] essai1_julius.wav: 78000 samples (4.88 sec.)
- arg ça avait fonctionné, maintenant j'ai cette erreur :
loading FIR filters for down sampling...done Warning: the rate 48000 Hz is not supported by your PCM hardware. ==> Using 96000 Hz instead. Error: monoral recording not supported on this device/driver Error: failed to ready input device to 48kHz
cela manque vraiment d'outils pour diagnostiquer la configuration pour l'enregistrement ! (pourtant audacity fonctionne dans ce cas)
ajout de perl-Jcode-2.06-1mdv2007.0.x86_64 (a priori ne sert que pour le japonais, pas gênant, enlève une erreur à la compil')
ne fonctionne pas trop :
arecord -f cdr -d 10 -t raw | adintool -nosegment -in stdin -48 -out file -filename essai2_julius
rec -c 2 -B -t cdr trim 0 2 -
-B : big endian
- ah j'ai trouvé un moyen pour que cela fonctionne mieux
support/ # vérifier qu'il n'y a pas d'erreurs à la compilation
build-bin/adintool -nosegment -48 -lv 2500 -in mic -out file -filename essai2
loading FIR filters for down sampling...done Audio I/O Latency = 42 msec (data fragment = 2048 frames) AD-in thread created ---- Input-Source: Microphone Segmentation: OFF ZeroFrames: drop remove DC: off Recording: essai2.wav (warning: be care of disk space!) ---- [start recording] [essai2.wav]..........................................[Interrupt] essai2.wav: 74000 samples (4.62 sec.)
ne pas oublier le -48 sinon il y a beaucoup de parasites
- réessayer avec --with-mictype=alsa : ok n'a pas l'air de fonctionner :/
Premier lancement de julian
julian -input mic -C julian.jconfinclude config: julian.jconf
julian.jconf: wrong argument: -iwsp
Try `-help' for more information.
=> utiliser une version multipath : support/ --enable-multipath --enable-julian
le son enregistré est franchement pas fort, peut-être regarder libsent/src/adin/adin_mic_linux_oss.c pour garder le canal R (au lieu du L, voir fonction adin_mic_read)
- cela a l'air de fonctionner, j'utilise le passage par un fichier pour pouvoir comparer entre la sortie de julian et ce que j'avais enregistré adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai8_julius
[baud@benLapix julius-3.5.2-quickstart-linux]$ ../julius/build-bin/julian -C julian.jconf -input rawfile include config: julian.jconf ###### check configurations ###### build up system Reading in HMM definition...(ascii)...limit check passed defined HMMs: 4382 logical names: 6075 in HMMList base phones: 44 used in logical done Making pseudo bi/mono-phone for IW-triphone...1026 added as logical...done reading [grammar/sample.dfa] and [grammar/sample.dict]... Reading in dictionary... 23 words...done Reading in DFA grammar...done - Gram #0: read [grammars] # 0: [active ] 23 words, 6 categories, 7 nodes (new) "sample" gram "sample" registered - Grammar update check Mapping dict item <-> DFA terminal (category)...done - Gram #0: installed Building HMM lexicon tree...........281 nodes coordination check passed done now beam width = 200 (guess) - update completed [grammars] # 0: [active ] 23 words, 6 categories, 7 nodes "sample" Global: 23 words, 6 categories, 7 nodes Generating addlog table...1953 kb...done All init successfully done ###### initialize input device ------------- System Info begin ------------- Julian rev.3.5.3-multipath (fast) Engine configuration: - Base setup : fast - Tunings : DFA, LibSndFile, IconvOutput - Compiled by: gcc -g -O2 Continuous Speech Recognition Parser based on automaton grammar Files: hmmfilename=acoustic_model_files_build726/hmmdefs hmmmapfilename=acoustic_model_files_build726/tiedlist grammar #1: dfa = grammar/sample.dfa dict = grammar/sample.dict Speech input source: file Acoustic analysis condition: parameter = MFCC_0_D_N_Z (25 dimension from 12 cepstrum) sample frequency = 16000 Hz sample period = 625 (100ns unit) window size = 400 samples (25.0 ms) frame shift = 160 samples (10.0 ms) pre-emphasis = 0.97 # filterbank = 24 cepst. lifter = 22 raw energy = False energy normalize = False delta window = 2 frames (20.0 ms) around hi freq. = OFF lo freq. = OFF zero mean frame = OFF base setup from = Julius defaults spectral subtraction = off HMM Info: 4382 models, 2543 states, 2543 mixtures are defined model type = context dependency handling ON training parameter = MFCC_N_D_Z_0 vector length = 25 cov. matrix type = DIAGC duration type = NULLD mixture num = 1 max state num = 5 skippable models = sp (1 model(s)) Dictionary Info: vocabulary size = 23 words, 73 models average word len = 3.2 models, 9.5 states maximum state num = 21 nodes per word transparent words = not exist words under class = not exist Lexicon tree info: total node num = 281 root node num = 22 leaf node num = 23 DFA grammar info: 7 nodes, 9 arcs, 6 terminal(category) symbols category-pair matrix size is 4 bytes Weights and words: (-penalty1) IW penalty1 = +5.0 (-penalty2) IW penalty2 = +20.0 (-cmalpha)CM alpha coef = 0.050000 (-sp)shortpause HMM name= "sp" specified, "sp" applied (physical) found sp category IDs = inter-word short pause = on (append "sp" for each word tail) sp transition penalty = -70.0 Search parameters: 1st pass decoding = batch with sentence CMN 1st pass method = 1-best approx. generating indexed trellis (-b) trellis beam width = 200 (-1 or not specified - guessed) (-n)search candidate num= 1 (-s) search stack size = 500 (-m) search overflow = after 2000 hypothesis poped 2nd pass method = searching sentence, generating N-best (-b2) pass2 beam width = 200 (-lookuprange)lookup range= 5 (tm-5 <= t <tm+5) (-sb)2nd scan beamthres = 200.0 (in logscore) (-gprune)Gauss. pruning = safe (-tmix) mixture thres = 2 / 0 (-n) search till = 1 candidates found (-output) and output = 1 candidates out of above IWCD handling: 1st pass: approximation (use max. prob. of same LC) 2nd pass: loose (apply when hypo. is popped and scanned) all possible words will be expanded in 2nd pass build_wchmm2() used lcdset limited by word-pair constraint output word confidence measure based on search-time scores System I/O configuration: speech input source = speech file input filelist = (none, enter filenames from stdin) sampling freq. = 16000 Hz required threaded A/D-in = supported, off zero frames stripping = on silence cutting = off remove DC offset = off reject short input = off short pause segmentation= off result output to = tty (standard out) output charset conv. = disabled ------------- System Info end ------------- ------ ### read waveform input enter filename->essai8_julius.wav input speechfile: essai8_julius.wav file format: Microsoft WAV, Signed 16 bit PCM, file native endian, 16000 Hz, 1 channels 139000 samples (8.69 sec.) ### speech analysis (waveform -> MFCC) length: 867 frames (8.67 sec.) ### Recognition: 1st pass (LR beam with word-pair grammar) .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... pass1_best: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s> pass1_best_wordseq: 0 2 4 4 4 4 4 1 pass1_best_phonemeseq: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil pass1_best_score: -21726.998047 ### Recognition: 2nd pass (RL heuristic best-first with DFA) samplenum=867 sentence1: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s> wseq1: 0 2 4 4 4 4 4 1 phseq1: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil cmscore1: 1.000 0.999 0.599 0.952 0.842 0.851 0.590 1.000 score1: -21681.556641 65 generated, 65 pushed, 9 nodes popped in 867
à faire
- voir pour la reconnaissance mono-locuteur en enregistrant un nombre limité de mots
- utiliser grammaire et modèles acoustique (un fichier de conf' est disponible)
- voir ce post notamment
- et celui pour Simon Dialog Manager concernant une interface speech-to-text
- réutilisation grammaire sphinx3 pour le français voir (Acoustic Models, trigram and quadrigram Language Models)
