ReconnaissanceVocale > julius

http://en.wikipedia.org/wiki/Julius_Speech_Recognition_Engine

http://www.voxforge.org/home/submitspeech/linux/step-2 [en] de très bon conseils pour vérifier que votre micro fonctionne bien, en plus des verifications que j'ai faites (qui peuvent aussi vous être utiles pour diagnostiquer)

Installation de Julius

http://julius.sourceforge.jp/en_index.php?q=en/index.html voir le support/build-all.sh pour plus d'informations sur la compilation (attention, il effectue l'install dans build-bin/)
./configure --with-mictype=alsa --enable-julian
make
make install # en root, installe dans /usr/local/bin par défaut : julius, julian, adinrec, adintool

Première utilisation de julius

$ adintool -48 -lv 25000 -nosegment -in mic -out file -filename essai1_julius
loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
  ZeroFrames: drop
   remove DC: off
   Recording: essai1_julius.wav (warning: be care of disk space!)
----
[start recording]
[essai1_julius.wav]..............................................[Interrupt]
essai1_julius.wav: 78000 samples (4.88 sec.)
loading FIR filters for down sampling...done
Warning: the rate 48000 Hz is not supported by your PCM hardware.
	     ==> Using 96000 Hz instead.
Error: monoral recording not supported on this device/driver
Error: failed to ready input device to 48kHz

cela manque vraiment d'outils pour diagnostiquer la configuration pour l'enregistrement ! (pourtant audacity fonctionne dans ce cas)
ajout de perl-Jcode-2.06-1mdv2007.0.x86_64 (a priori ne sert que pour le japonais, pas gênant, enlève une erreur à la compil')

ne fonctionne pas trop :
arecord -f cdr -d 10 -t raw | adintool -nosegment -in stdin -48 -out file -filename essai2_julius

rec -c 2 -B -t cdr trim 0 2 -
-B : big endian
./configure ; make clean
support/build-all.sh # vérifier qu'il n'y a pas d'erreurs à la compilation
build-bin/adintool -nosegment -48 -lv 2500 -in mic -out file -filename essai2
loading FIR filters for down sampling...done
Audio I/O Latency = 42 msec (data fragment = 2048 frames)
AD-in thread created
----
Input-Source: Microphone
Segmentation: OFF
  ZeroFrames: drop
   remove DC: off
   Recording: essai2.wav (warning: be care of disk space!)
----
[start recording]
[essai2.wav]..........................................[Interrupt]
essai2.wav: 74000 samples (4.62 sec.)

ne pas oublier le -48 sinon il y a beaucoup de parasites

Premier lancement de julian

julian -input mic -C julian.jconf
include config: julian.jconf
julian.jconf: wrong argument: -iwsp
Try `-help' for more information.
Terminated
=> utiliser une version multipath : support/build-all.sh --enable-multipath --enable-julian

le son enregistré est franchement pas fort, peut-être regarder libsent/src/adin/adin_mic_linux_oss.c pour garder le canal R (au lieu du L, voir fonction adin_mic_read)

[baud@benLapix julius-3.5.2-quickstart-linux]$ ../julius/build-bin/julian -C julian.jconf -input rawfile
include config: julian.jconf
###### check configurations
###### build up system
Reading in HMM definition...(ascii)...limit check passed
   defined HMMs:  4382
  logical names:  6075 in HMMList
	base phones:    44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...1026 added as logical...done
reading [grammar/sample.dfa] and [grammar/sample.dict]...
Reading in dictionary...
23 words...done
Reading in DFA grammar...done
- Gram #0: read
[grammars]
  # 0: [active     ]   23 words,   6 categories,    7 nodes (new) "sample"
gram "sample" registered
- Grammar update check
Mapping dict item <-> DFA terminal (category)...done
- Gram #0: installed
Building HMM lexicon tree...........281 nodes
  coordination check passed
done
now beam width = 200 (guess)
- update completed
[grammars]
  # 0: [active     ]   23 words,   6 categories,    7 nodes "sample"
  Global:              23 words,   6 categories,    7 nodes
Generating addlog table...1953 kb...done
All init successfully done

###### initialize input device
------------- System Info begin -------------
Julian rev.3.5.3-multipath (fast)

Engine configuration:
 - Base setup : fast
 - Tunings    : DFA, LibSndFile, IconvOutput
 - Compiled by: gcc -g -O2

Continuous Speech Recognition Parser based on automaton grammar

Files:
	hmmfilename=acoustic_model_files_build726/hmmdefs
	hmmmapfilename=acoustic_model_files_build726/tiedlist
	grammar #1:
	    dfa  = grammar/sample.dfa
	    dict = grammar/sample.dict

Speech input source: file

Acoustic analysis condition:
	           parameter = MFCC_0_D_N_Z (25 dimension from 12 cepstrum)
	    sample frequency = 16000 Hz
	       sample period =  625  (100ns unit)
	         window size =  400 samples (25.0 ms)
	         frame shift =  160 samples (10.0 ms)
	        pre-emphasis = 0.97
	        # filterbank = 24
	       cepst. lifter = 22
	          raw energy = False
	    energy normalize = False
	        delta window = 2 frames (20.0 ms) around
	            hi freq. = OFF
	            lo freq. = OFF
	     zero mean frame = OFF
	     base setup from = Julius defaults

	spectral subtraction = off

HMM Info:
	4382 models, 2543 states, 2543 mixtures are defined
	          model type = context dependency handling ON
	  training parameter = MFCC_N_D_Z_0
	       vector length = 25
	    cov. matrix type = DIAGC
	       duration type = NULLD
	         mixture num = 1
	       max state num = 5
	  skippable models = sp (1 model(s))

Dictionary Info:
	    vocabulary size  = 23 words, 73 models
	    average word len = 3.2 models, 9.5 states
	   maximum state num = 21 nodes per word
	   transparent words = not exist
	   words under class = not exist

Lexicon tree info:
	     total node num =    281
	      root node num =     22
	      leaf node num =     23

DFA grammar info:
	  7 nodes, 9 arcs, 6 terminal(category) symbols
	  category-pair matrix size is 4 bytes

Weights and words: 
	    (-penalty1) IW penalty1 = +5.0
	    (-penalty2) IW penalty2 = +20.0
	    (-cmalpha)CM alpha coef = 0.050000
	    (-sp)shortpause HMM name= "sp" specified, "sp" applied (physical)
	      found sp category IDs =
	     inter-word short pause = on (append "sp" for each word tail)
	      sp transition penalty = -70.0

Search parameters: 
	          1st pass decoding = batch with sentence CMN
	            1st pass method = 1-best approx. generating indexed trellis
	    (-b) trellis beam width = 200 (-1 or not specified - guessed)
	    (-n)search candidate num= 1
	    (-s)  search stack size = 500
	    (-m)    search overflow = after 2000 hypothesis poped
	            2nd pass method = searching sentence, generating N-best
	    (-b2)  pass2 beam width = 200
	    (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
	    (-sb)2nd scan beamthres = 200.0 (in logscore)
	    (-gprune)Gauss. pruning = safe
	    (-tmix)   mixture thres = 2 / 0
	    (-n)        search till = 1 candidates found
	    (-output)    and output = 1 candidates out of above
	     IWCD handling:
	       1st pass: approximation (use max. prob. of same LC)
	       2nd pass: loose (apply when hypo. is popped and scanned)
	     all possible words will be expanded in 2nd pass
	     build_wchmm2() used
	     lcdset limited by word-pair constraint
	     output word confidence measure based on search-time scores

System I/O configuration:
	        speech input source = speech file
	             input filelist = (none, enter filenames from stdin)
	             sampling freq. = 16000 Hz required
	            threaded A/D-in = supported, off
	      zero frames stripping = on
	            silence cutting = off
	           remove DC offset = off
	         reject short input = off
	    short pause segmentation= off
	           result output to = tty (standard out)
	       output charset conv. = disabled

------------- System Info end -------------

------
### read waveform input
enter filename->essai8_julius.wav 

input speechfile: essai8_julius.wav
file format: Microsoft WAV, Signed 16 bit PCM, file native endian, 16000 Hz, 1 channels
139000 samples (8.69 sec.)
### speech analysis (waveform -> MFCC)
length: 867 frames (8.67 sec.)
### Recognition: 1st pass (LR beam with word-pair grammar)
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

pass1_best: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
pass1_best_wordseq: 0 2 4 4 4 4 4 1
pass1_best_phonemeseq: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
pass1_best_score: -21726.998047

### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=867
sentence1: <s> PHONE JORDAN JOHNSTON JOHNSTON JOHNSTON STEVE </s>
wseq1: 0 2 4 4 4 4 4 1
phseq1: sil | f ow n | jh ao r d ax n | jh aa n s t ax n | jh aa n s t ax n | jh aa n s t ax n | s t iy v | sil
cmscore1: 1.000 0.999 0.599 0.952 0.842 0.851 0.590 1.000
score1: -21681.556641
65 generated, 65 pushed, 9 nodes popped in 867


à faire



CategoryLangFr CategoryHobby
There are no comments on this page.
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki