Senior Design: February 2013

Thursday, February 28, 2013

Unity + Derp = Derpity

I was on schedule to run the whole pipeline once without any algorithm changes, and then I realized I have no idea how to use unity (haha). I finished generating the data file from the SPR portion of the pipeline when I was sleeping last night, and this morning I woke up ready to stuff it into unity.

Behold, I open up the project file, and I have no idea what any of the panels do. So I watched a bunch of tutorials. (like this one: http://www.youtube.com/watch?feature=player_embedded&v=hbjB80-Mc7E ).

Very helpful. Now I am reading the code, and trying to figure out how to specify which sound to propagate. Maybe pengfei will be in the lab tomorrow (thursday), he was busy today >x< ...

The code currently is weird in "initmap" of soundpropogator. I have a vague idea that somewhere i should be replacing the dataset, but I haven't yet found where that lives yet.

I think the code is fine in the init steps however. The reciever is outputting a signal. (I can't tell which signal but there is definatly a signal).

I'll take another look tomorrow.

Tuesday, February 26, 2013

Error using ==> spectrogram

I was getting this error for the longest time... It turns out its not the matlab code's fault. It's my test file's fault. The sound files need to be in mono instead of stereo. haha

here are some results. They look better than the last set of data i was using C:
now i need to do the hand-editing part where i stick all these files through photoshop.

A comparison. on old data and new data

above: new 'b'
below: old 'b'

====================================================================

above: 'a' sound. a vowel
below: 'p' sound. a consonant

above: 's' sound. A consonant, fricative, voiceless.
below: "ou" sound as in out. A dipthong

Monday, February 25, 2013

Met with pengfei today

Pengfei showed me the whole pipeline today. And was very helpful with the few questions I have.
I realized I actually understood the pipeline wrong.

TODO for alpha: run through the whole pipeline once with my data. I hope I can hit this pipeline.

1) generate images for sound extraction (done)
2) hand edit images (current)
3) run images through comparison code
4) propogate
5) compare

Sound sources using now:
http://www.teachingenglish.org.uk/activities/phonemic-chart
http://manual.audacityteam.org/man/Tutorial_-_Recording_audio_playing_on_the_computer

I really liked the sound sources from the first link. So using the tutorial on recording within the computer, I am creating a new library of play-able sound.

Friday, February 22, 2013

sound files + bleh consonants

Made a matrix, downloaded some sound files (sources below). Pengfei's matlab code now saves graphs automatically because I got too lazy to save them by hand C: (tehe I love matlab)

http://beta.freesound.org/people/janmario/downloaded_packs/
http://www.phonetics.ucla.edu/course/chapter1/chapter1.html

I think I need to record my own sound files. These sound files I downloaded have too much vowels in them. The image below is the 'p' sound (as in "lip"). The sound file I downloaded from online is really "pa". This makes sense because a consonant is a-periodic, and thus you can't really pronounce the consonant without the vowel. What this means in terms of the graph below, is that the "p" sound is really the vertical blue lines at the beginning and the "a" is the red part. I don't think pengfei's code (as of right now) can deal with the level of detail consonants need to be evaluated at.

At some point, (after figuring out the consonants), we need to catagorize different consonants so we can create a HCA tree.

And because these images are so pretty C:

Above: the vowel in "hot"
Below: the 'm' in "am"

And again, the "hot" sound is fine because it is a vowel. the "m" sound is really "ma" and you can see what we need to take out is just the blue stripe in the beginning. The rest of the data is really an unnecessary "a" sound.

Actually looking at all these graphs. It reminds me of this art project one of my teachers showed me. Spoken word is actually really werid and chaotic. Breaking speech down into phoneme works to a certain extent, but it is actually a pretty bad way of reproducing speech. The artist in the video recorded himself speaking all the different phonemes, and tried to speak with by pasting the phonemes together. (You can clearely see that it doesn't work well).

Sunday, February 17, 2013

Confusion matrix

http://pubman.mpdl.mpg.de/pubman/item/escidoc:67125:6/component/escidoc:67126/Consonant+And+Vowel+Confusion+Patterns+By.pdf

http://pubman.mpdl.mpg.de/pubman/item/escidoc:60592:2/component/escidoc:60593/Cutler_2004_patterns.pdf

http://people.cs.uchicago.edu/~dinoj/research/confmat.html

The one problem that I have with consonants isthat any place I download them, they are flanked by a vowel of some sort (because consonants are more easily recognized when they have vowels attached). But I'm pretty sure for this program, we want the whole thing in terms of just by its self. Maybe I should just record them... ah ha

Friday, February 15, 2013

Meeting notes 2/15/2013

keep on revising the document through the project's duration.

I will get matlab code this weekend, and use the matlab code to generate packets. input the sound, output is flat file and read it into unity.

0th step, use penfei's code as is to propagate
use unity code to get the distorted packe.
see if the simple matlab code is enough.

mat lab code needs confusion matrix. Idea--the decretization in our code should be same as the one in real life.

Project Proposal C:

Just sent out the revised proposal. Took me a while (somehow writing isn't the easiest thing to me).
Reference material section hasn't really been revised, but I do have a list of the sources I am using.

From my todo list I didn't play with matlab. I did play with pengfei's code for about 30 minutes. I don't think I was productive for the 30 minutes. I will try again in the comming week when I have more time.

Oh, I'm not sure what the SVN access is for... I should ask about that.

Obligatory post of the week?

Rewriting my project proposal as I am typing this. Will post it when I'm done. It seems that SPREAD works very very very very differently than what I originally thought when I first read the paper. I'm actually a bit worried that my idea to distinguish between sounds is impossible. In fact, I'm almost sure that doing this will be impossible (given the current SPREAD's data structure). At the same time, this makes me wonder what I can add into SPREAD so that the propagation of speech sounds becomes do-able.

I just sent an email to someone in the linguistics department asking to meet. I hope that I can spend maybe 1 hour with them, and they will tell me what is the minimums set of data needed for a person to distinguish phonemes from each other.

======================================

I didn't do much this week because I was cramming for my STAT exam since Monday, and the weekend I went home for Chinese new year. I'll make up the time next week.

I made a repository for Pengfei's code on my private github account. I've vaguely looked at it. The fact that it is C# is a bit scary. But I'll trust Pengfei when he says it is easy to use.

I did some more reading between doing STAT. And I have a vague plan for the pre-processing part. Currently:

given a sound file, the mat lab code breaks down the sound signals into tiny sound chunks and does Fourier analysis on it. For each time chunk, the code will basically take the Fourier transform with the largest coefficient, and propagate that (which is where the term "SPREAD" comes from. the one frequency will become a spectrum of frequencies as it travels through space).

I know that to distinguish between vowels, at least the information of 3 formats are needed. So 3 packets (minimum) is needed for each chunk of time. I am planning (once I get the matlab code), to see how I could add to it so I can gather the data of 3 formants.

===========================================

My alpha is on the 28th at 3:15. I am aiming for some matlab code, and initial tests.

Tuesday, February 12, 2013

TODO this week

I have a STAT midterm this week. But planning the following by friday:

- rewrite of proposal (I can't really 100% focus on this until thursday night after my exame)
- do a vague "plan of attack" (aka: brain storms ways to approach the problem)
- play with pengfei's code a bit
- play with matlab (a bit of a stretch)

Friday, February 8, 2013

2/8/13 Meeting notes

phonemos --> spr(sound packet representation) --> propagation step (as is) --> {p}' (distorted sound packets) --> perception

phoneme --> spr
decretization step --> wanto minimize the
choose particular sound packet represnetaiton such that the representation of that packet is most similar to the percieved confusion matrix.

perception :
a) filtering ;;
b) DTW (dynamic time warping) *simliarity measure --> HMM

(need some kind of confidence)

matlab, optmization, min || C_sir - C_perception ||

task1: not do any work at all, stuff it through and see how it goes (which packets survive at listener)
task2: apply filter at the spread

let's assume HMM is solved.

play with sampling and etcetcetc...

Thursday, February 7, 2013

Slides

I keep on forgetting to update this blog.

https://docs.google.com/presentation/d/1eFjVly88QdqGwIjtdX-USomIEKv5i2eYybdmKPDZl14/edit?usp=sharing
I have slides for the presentation tomorrow. I really don't think I'm qualified to give this presentation. I debated how much technical information to include, and I don't think there's that much. All the technical information I vaguely learned this week, I feel like will be uninteresting to the audience. Plus, most of the things I read, I feel that I don't actually understand (most of it I'm reading it to know that it exists somewhere). At some point I lost track of the sources I've been reading. Everything I read, I have to look up half the material (and in that looked-up material, I'm looking up more material...)

Concern for the project however. It seems that my project is really about how to create a HCA from a confusion matrix. However this is after propagation (which also has problems on it's own).
1 - I don't even know how propagation will turn out (from my readings, it seems that Fourier analysis may not be the best coefficients to use for speech recognition)
2- There's nothing that says a HCA will work on speech sounds.After propogation, if the confusion matrix is very far from the identity matrix (aka: close to a matrix of random numbers), then I don't think I can create an HCA tree for the data.

Monday, February 4, 2013

Friday Meeting Notes

Time-Domain propagation. (with frequency)
1k signal. --> scale any frequency into 1- 1k hz.

if rep 10k hertz; divide all freq by 10.

pretend all frequecy divided by like 5. --> stretch out the sound waves.

SPREAD 50hz (64 hz) then do powers of 2). 5 bands.... etcetc lala

telephone 300 --> 3k hz

so main idea is just stretch the signal, propagate, shrink it to scale.
--> simple experiments

ambient sound.

--> iset --> input speech signal and output phonemes

given phonemes
how does phonemes degrade. --> natural degradation.
-- > phonemes ordered on user evaluation of any pair simulation
--> clustered together

phonemes confusion matrix <-- HCA tree @U@

tehe

implementation --> HCA tree to phonemes
--> confusion tree

problem statement: propagate speech for
proposed solution:
1- extend sound packet rep
2- propogate that
3- preception cite : construte HAC

experimental target:
1- Assume agent will respond to name
2- will agen respond to name after degredation
3- let one agent call another by name

norm wants cocktail party effect. C:

input: look into existing methods of detecting phonemes in speech
middle: modify spread packet: hca tree: confusion matrix
then: give each agent a name, and a matching algorithm
(lipsinc software)
end:

- need a database of 40 phonemes

phonemes paper--> speech signal degredation/voice quality degredation/ computational representation of speech

alpha --> pipeline