Open in new window / Try shogun cloud
--- Log opened Fri Feb 15 00:00:46 2013
-!- heiko [] has quit [Quit: Leaving.]00:02
n4nd0good night guys00:26
-!- n4nd0 [] has quit [Quit: leaving]00:26
-!- FSCV [~FSCV@] has quit [Quit: Leaving]00:56
-!- blackburn [~blackburn@] has quit [Quit: Leaving.]06:47
-!- pari [6eeaec2a@gateway/web/freenode/ip.] has joined #shogun06:59
pariHi all, I am new to this community and am willing to start contributing to the toolbox. It would be very helpful if you could supply me with possible ideas/ areas to look into. Thanks in advance.07:06
-!- blackburn [] has joined #shogun07:33
blackburnpari: try to run compile and then try some examples, subscribe to the mailing list and check out ideas at github07:34
blackburn*remove 'run' word07:37
-!- sumit [73f91219@gateway/web/freenode/ip.] has joined #shogun07:55
@sonney2kblackburn, what is wrong with the API? YOu could easily create SGMatrix etc from opencv matrices08:30
@sonney2kthanks to us using MALLOC & friends08:30
@sonney2kwith zero overhead...08:30
-!- pari [6eeaec2a@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]08:31
blackburnsonney2k: no, nothing wrong at all - I just have a feeling it could be better somehow08:31
blackburnsonney2k: although I have no idea how08:31
@sonney2kSGMatrix<float64_t>(opencvmat, m,n, false)08:31
@sonney2kthen do whatever?08:31
@sonney2khow can it be more simple?08:32
@sonney2konly way I can think of is directly using their features08:32
-!- hoijui [] has joined #shogun08:38
blackburnsonney2k: have you ever played to sid meier's alpha centauri?08:50
-!- sumit [73f91219@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]09:22
-!- n4nd0 [] has joined #shogun09:45
sonne|workn4nd0: do we have any news on the website? looks awful now intermixed with the WS09:48
n4nd0sonne|work: I started taking a look last week to the website code in github but nothing else09:49
n4nd0I will focus on it from now on during my time shogun-ing09:50
blackburnsonne|work: what is ws?09:51
n4nd0blackburn: workshop09:51
-!- flx_ [] has joined #shogun10:37
-!- flx_ [] has left #shogun []10:37
-!- flxb [] has joined #shogun10:40
-!- flxb [] has quit [Quit: leaving]10:47
-!- flxb [] has joined #shogun10:48
-!- flxb [] has quit [Ping timeout: 244 seconds]11:25
-!- flxb [] has joined #shogun12:03
-!- flxb [] has quit [Ping timeout: 244 seconds]12:34
-!- heiko [] has joined #shogun12:56
-!- n4nd0 [] has quit [Ping timeout: 252 seconds]13:15
-!- flxb [] has joined #shogun13:20
-!- n4nd0 [] has joined #shogun13:30
-!- n4nd0 [] has quit [Client Quit]13:30
-!- n4nd0_ [] has joined #shogun13:30
-!- n4nd0_ is now known as n4nd014:19
n4nd0blackburn: Georg has done a very nice job!14:20
-!- hoijui [] has quit [Quit: Leaving]14:23
blackburnn4nd0: yeah14:26
-!- tom___ [2eda6d52@gateway/web/freenode/ip.] has joined #shogun14:27
n4nd0blackburn: so what do you think of his main idea? the first one listed in the gdocs14:27
n4nd0hey tom___14:27
tom___I would like some precision about output format in shogun (hdf5, json, ascii .. )14:28
tom___which one is more mature ? which one would you recommand ,14:28
n4nd0tom___: what do you want to do?14:28
tom___saving/loading multiclass machine, features, labels ..14:28
n4nd0I guess it depends a bit on your preferences, what you feel most comfortable with14:29
tom___I am running with c++ api and would like to plot some results with python then save in c++ load in python14:29
n4nd0if you are going to use large volumes of data, then you should go for hdf5 maybe14:29
n4nd0AFAIK it should be the most efficient of those14:30
tom___size of ascii files seems smaller than with hdf5 ? possible ?14:30
n4nd0if you are not storing lot of data, it might be possible14:30
tom___both format are compatible with the python api ?14:31
n4nd0tom___: using shogun methods you mean?14:32
n4nd0yeah, they should be14:32
n4nd0have you taken a look to python examples?14:32
n4nd0let me see14:32
tom___what format do you use yourself ? :)14:32
n4nd0I have not really used them for large applications14:34
n4nd0just small tests and IIRC I used ascii14:34
n4nd0tom___: I see that features_io_modular.py14:35
n4nd0under examples/undocumented/python/14:35
n4nd0uses the three formats14:35
n4nd0ascii, binary and hdf514:35
n4nd0maybe it can help you decide which one you want to select14:36
tom___allright thanks for advices14:36
tom___have a nice day14:36
n4nd0tom___: thanks, you too14:37
-!- tom___ [2eda6d52@gateway/web/freenode/ip.] has quit [Quit: Page closed]14:37
-!- FSCV [~FSCV@] has joined #shogun14:42
-!- FSCV_ [~FSCV@] has joined #shogun14:43
-!- FSCV [~FSCV@] has quit [Ping timeout: 252 seconds]14:47
-!- blackburn [] has quit [Quit: Leaving.]14:59
-!- flxb [] has quit [Ping timeout: 264 seconds]15:26
-!- lambday [b6131049@gateway/web/freenode/ip.] has joined #shogun15:48
-!- n4nd0 [] has quit [Quit: leaving]16:01
-!- flxb [] has joined #shogun16:14
sonne|work^ blackburn - such stuff can only happen in rossia :)16:41
heikoman, I hope blackburn is still alive :)16:52
sonne|workheiko: do you use eclipse and c++?17:03
heikosonne|work, yes, why?17:03
heikowiking thats fake17:03
wikingnice photoshop :D17:04
-!- flxb [] has quit [Ping timeout: 248 seconds]17:09
-!- sonne|work [] has quit [Quit: Leaving.]17:26
-!- heiko [] has quit [Quit: Leaving.]17:26
-!- heiko [] has joined #shogun17:27
-!- KMcQuisten [d8338942@gateway/web/freenode/ip.] has joined #shogun17:53
KMcQuistenIs there a set way to make the shogun wrapper of LibSVR perform nu-SVR, or is it hardwired to do epsilon-SVR?18:12
-!- alexlovesdata [~binder@2001:638:806:e001:eda1:1d04:97b:c9f] has joined #shogun18:17
-!- FSCV_ [~FSCV@] has quit [Ping timeout: 245 seconds]18:25
-!- heiko [] has quit [Quit: Leaving.]18:25
-!- FSCV [~FSCV@] has joined #shogun18:27
KMcQuistenAnyone alive, or are you all sleeping on the other side of the world?18:39
-!- FSCV_ [~FSCV@] has joined #shogun18:39
-!- FSCV [~FSCV@] has quit [Ping timeout: 248 seconds]18:42
-!- KMcQuisten [d8338942@gateway/web/freenode/ip.] has quit [Quit: Page closed]18:51
-!- heiko [] has joined #shogun18:55
-!- blackburn [~blackburn@] has joined #shogun19:07
heikoblackburn, does it make sense to allow the tube epsilon for model selection?19:08
heikoand epsilon?19:08
blackburnheiko: I remember that question!19:08
heikosee email19:08
blackburnheiko: yeah19:09
heikojust add this?19:09
blackburnheiko: okay it won't hurt right?19:10
heikono, its just: does it make sense?19:10
blackburnheiko: I think it can be considered19:10
blackburnjust some more flexibility19:10
heikook then, here we go19:11
blackburnwiking: about video - it is not fake but it was in tajikistan :D19:11
blackburnnot that meteorite thing19:11
blackburnheiko: ah you said it is fake - it is not :)19:12
heikoblackburn, really?19:12
heikothis was said on a german news webiste19:12
heikoglad you still live :)19:12
blackburnheiko: it is some gas fire19:12
heikowe would be screwed without you19:12
blackburnheiko: fake is that it was caused by meteorite19:12
heikoyeah thats what I meant, not the meteor19:12
blackburnit happened before in other country19:12
blackburnbut actually it happened :)19:13
alexlovesdatagood the blackburn did not loose too much of his brain due to meteorite strike :D19:13
-!- shogun-notifier- [] has joined #shogun19:13
shogun-notifier-shogun: Heiko Strathmann :master * 9f1daba / src/shogun/classifier/svm/SVM.cpp:
shogun-notifier-shogun: added epsilon and tube_epsilon for model selection19:13
shogun-notifier-shogun: Heiko Strathmann :master * 83d3fbe / src/shogun/classifier/svm/SVM.cpp:
shogun-notifier-shogun: Merge pull request #877 from karlnapf/master19:13
shogun-notifier-shogun: added epsilon and tube_epsilon for model selection19:13
blackburnalexlovesdata: pretty far away from me19:13
blackburnyou know, close is 1000 km for me19:13
blackburnit is somewhat 1500 km far away I think :D19:13
blackburnalexlovesdata: I have lost my brain due to vodka though!19:14
blackburnalexlovesdata: see mr. poutine riding19:16
blackburn that's how russian tv looks like19:18
alexlovesdataWW3 welcome :|19:19
alexlovesdatado we have some hidden long range cannon under the north polar ice cap?19:20
alexlovesdatado the russians operate devices like this??19:20
blackburnalexlovesdata: russians exchanged everything to vodka19:20
blackburnalexlovesdata: ahh I was searching for this pic to show to you19:22
blackburnalexlovesdata: translating: chelyabinsk (the place hit by that stone) before stone shower19:22
blackburnand chelyabinsk after stone shower19:23
blackburnheiko: ha! have you seen that mail? local alignment kernel as gsoc project19:24
heikoblackburn, good idea19:25
heikoalthough most of them should be there19:25
heikoits just a mess to use19:25
blackburnheiko: most of what?19:25
heikowould be good to cleanup and have examples and proper documentation19:25
heikostring kernels19:25
blackburnah yes19:25
blackburnbut not only one kernel19:25
blackburnheiko: who can mentor that?19:26
heikos?ren :D19:26
blackburnheiko: anyone else? :D19:27
heikomaybe oli stegle?19:27
heikobut he is busy in Cambridge now :=)19:27
heikodont know19:27
heikobut would be good to have19:27
heikosince many biologists out there19:28
heikoand its an easy project19:28
blackburnthe more ideas we have the more difficult it would be to choose19:28
heikoso far most of the projects are more advanced19:28
blackburnbut the final choice should get better too19:28
blackburnyes I want to improve biological part of shogun19:28
blackburnas I don't solve any biological problems I do not understand anything19:29
heikoI know a little bit about these kernels so I could help out a bit if e.g. the main mentor is too busy19:30
heikobut would be better to have some expert19:30
blackburnmay be I should ask chris19:30
blackburnhe should know something :D19:30
heikoyes, I mean they are close to this19:31
alexlovesdataI wonder a bit ...19:33
blackburnheiko: have you seen they moved gsoc thingy a month forward?19:33
heikonot seen19:33
blackburnit ends sept. 2719:33
heikowhat means forward?19:33
heikoto later19:33
alexlovesdatacan we optimize large scale primal svms when the features do not fit into memory?19:33
alexlovesdataI know Cstreamingfeatures anf vowpal wabbit19:33
blackburnheiko: and it starts a bit later19:33
blackburnheiko: and I won't receive t-shirt :D19:33
alexlovesdataI was thinking: what strategies do we have for that mega large scale case?19:34
blackburndue to rules19:34
blackburnalexlovesdata: ultra mega scale19:34
blackburnalexlovesdata: good q19:34
blackburnalexlovesdata: we have outdated vowpal wabbit though19:34
-!- sumit [73f91219@gateway/web/freenode/ip.] has joined #shogun20:30
-!- lambday [b6131049@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]20:51
-!- sumit [73f91219@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]20:51
shogun-notifier-shogun: Heiko Strathmann :master * 2e37cd5 / / (11 files):
shogun-notifier-shogun: Made new kernel selection methods for linear time MMD available under modular interfaces.20:58
shogun-notifier-shogun: Includes some other minor changes20:58
shogun-notifier-shogun: Heiko Strathmann :master * 17d6ba9 / / (11 files):
shogun-notifier-shogun: Merge pull request #878 from karlnapf/master20:58
shogun-notifier-shogun: kernel selection for linear time mmd for modular interfaces20:58
-!- heiko [] has left #shogun []20:58
@sonney2kalexlovesdata, hey there22:23
alexlovesdatahey sonney2k22:23
alexlovesdataI am thinking of an MKL extension on cross-validation data22:24
alexlovesdataso that it works beter in practice22:24
alexlovesdataon Sunday I consider to write some KDA for shogun ... once I got behing ther git monster22:25
@sonney2kalexlovesdata, about the after workshop happening22:26
alexlovesdatayep ... what will be the input to Andrea?22:26
@sonney2kcould you or someone from the TU people take care of the crowd?22:26
alexlovesdatayes I could do22:26
@sonney2klike AZ22:26
alexlovesdataandreas Z ?22:27
@sonney2kor if chris is around (he is officially TU right?)22:27
@sonney2kI am away sadly :/22:27
alexlovesdataI can sk AZ whether he likes to join in helping me ... but what is the motivation of AZ to help me?22:27
alexlovesdataChris will help because he is involved in it clearly22:27
@sonney2kalexlovesdata, he will mentor ICA for shogun anyways22:27
alexlovesdataahhh , ok22:27
@sonney2kso it might get him connected to the crowd?22:28
@sonney2khe might like it22:28
alexlovesdataye ... I can ask him with the pretext that he will anyway mentor it22:28
alexlovesdataany special expectations about taking care of the crowd?22:29
@sonney2kI guess you only need some official who is around from beginning to end on the weekend22:29
@sonney2kyou could probably split this up22:29
alexlovesdatashould be doable22:29
@sonney2kI would have organized that thing more early but we wanted it to be as close as possible to ISMB22:30
@sonney2k(and me around) might not be close  enough but nevertheless22:30
@sonney2kalexlovesdata, for ultra mega large scale we use mean(x) > threshold22:31
@sonney2kbtw ;)22:31
alexlovesdatahow does mean(X)22:31
alexlovesdata> threshold help in that?22:31
@sonney2kalexlovesdata, ultra mega largescale boils down to computing *simple* statistics22:32
@sonney2kand simple rules22:32
@sonney2kno more22:32
@sonney2keverything else is beyond ...22:32
@sonney2kbut yes we don't have anything distributed22:32
@sonney2kas in map-reduce & friends22:32
alexlovesdatahmmm ... that could be something for GSOC ;)22:32
@sonney2kso we can only realisticly handle a billion of examples22:32
alexlovesdatawhere does the limit come from?22:33
@sonney2kactually there is no limit22:33
@sonney2kit is just that you need a fast harddisk to stream data22:33
alexlovesdatasvm optimization could be parallelized in principle22:34
@sonney2kwith streaming features you can just ask: 'give me the next example'22:34
@sonney2kalexlovesdata, yeah but that is not the bottleneck22:34
@sonney2kbottleneck is reading from disk22:34
alexlovesdatawell ... fast hard disks are no problem since the arrival of ... ;)22:34
@sonney2kthey are dog slow22:34
@sonney2kyou can do thousands of svm iterations while reading the next example from disk22:34
@sonney2kso mini-batch algorithms are fast22:35
@sonney2kyes even ssds are dog slow22:35
alexlovesdatabut one could work on distrobuted batches22:35
@sonney2kit is like <300MB/s vs. 10GB/s main memory speed vs. 1000GB/s cache speed22:35
alexlovesdatathe problem is an interface that is flexible to a lot of ways of parallelization22:36
@sonney2kthat would just give you cache misses22:36
-!- sumit [75e10b98@gateway/web/freenode/ip.] has joined #shogun22:36
alexlovesdataI would not dare to comentor GPU svms22:36
@sonney2kbut I agree there is lots of stuff one could do22:36
@sonney2kGPU svms are overhyped22:36
@sonney2kthey can be *only fast* if all data fits in GPU memory22:36
alexlovesdatayep ...22:37
@sonney2kso lets say you have 2 G of GPU mem22:37
alexlovesdatathat expeerience I had with clustering on GPUs already22:37
@sonney2kthat woudl be a tiny problem22:37
@sonney2kso it is useless22:37
alexlovesdatabut distributed SVMs is a thing we can try22:37
@sonney2kconsidering that I was solving 60G problems with SVMs 5 years ago22:37
@sonney2kI would say linear SVMs with piece wise linear functions is the thing you should use22:38
alexlovesdatapiece wise linear on what?22:38
alexlovesdatayou mean locally linear SVMs ?22:38
@sonney2kwith shogun these piece wise linear functions can be implicilty used so you can have a svm-normal vector of a few GB but only say 100 dimensional input data22:38
@sonney2kno linear SVMs but you use binary features:22:39
@sonney2klook at
@sonney2kgot to leave the train22:40
alexlovesdatawhy losing information with binarization?22:40
@sonney2kyou hardly do22:40
alexlovesdatais there any paper on that to udnerstand it?22:41
alexlovesdataok I understand it22:42
alexlovesdataroughly ... not fully22:43
-!- n4nd0 [] has joined #shogun22:49
@sonney2kalexlovesdata, I got some pretty good results with that and extremely fast... it is more expressive than linear - just depending on the number of bins22:57
@sonney2kalexlovesdata, what I would hope is that fastfood (smolas paper) would give us sth even better w/ kernels22:59
alexlovesdataI have to read the fast food paper23:00
alexlovesdatawell ... for the most common kernel ... linear23:00
alexlovesdataRF features always allow to use primal solveras23:00
alexlovesdatasimilar developments exist for other kernels ... see the paper by vedaldi23:00
alexlovesdatarandom fourier features23:02
alexlovesdataone needs to be careful about kernel weight23:02
alexlovesdatabut one can get a transform such that the transformed features approximate a gaissuan kernel23:03
n4nd0sonney2k: you won't be around during the workshop? :(23:08
@sonney2kn4nd0, I will be around during workshop but not on saturday/sunday in the hands-on session :/23:16
@sonney2kalexlovesdata, that you can do such transformation is not sufficient23:16
n4nd0oh that's a pity :(23:16
@sonney2kalexlovesdata, you cannot precompute these features23:16
n4nd0sonney2k: but we will see you during the workshop :)23:16
@sonney2kalexlovesdata, you have to do that on-the-fly and very very fast23:16
@sonney2kotherwise you gain nothing23:16
alexlovesdatayou just need to save the generating coefficients23:17
alexlovesdatathen RF acts as a preprocessors23:17
alexlovesdatawhich transforms on the fly23:17
alexlovesdataor do you see any problems about that?23:17
@sonney2kalexlovesdata, no - I am just thinking that this would be some easy student task to prepare for gsoc23:27
@sonney2kimplement algorithm 1 here23:27
@sonney2kjust a a shogun preprocessor23:28
alexlovesdatait is implemented alreade :D23:28
@sonney2kfor the gaussian kernel23:28
@sonney2kalexlovesdata, what?23:28
alexlovesdatawait ... I find the name of it in shogun23:28
@sonney2kohh randomfouriergausspreproc :D23:28
alexlovesdatahahaha, yes23:28
@sonney2kstso then randombinning too :D23:29
n4nd0nobody really knows everything in SHOGUN :D23:29
alexlovesdatastso = ?23:29
alexlovesdatashogun ... the dark dungeons23:29
* sonney2k is impressed23:30
* n4nd0 too23:30
@sonney2kI guess I just use 10% of shogun23:30
alexlovesdatamost of us use 0.5% I would say23:30
alexlovesdataanyway it might be useful to read about fastfood23:31
alexlovesdataI just want to get this tourine here going and then drive home23:31
alexlovesdatatourine =routine23:32
@sonney2kalexlovesdata, I think this preprocessor should better be defined as DotFeatures... then any linear SVM could directly work on this23:35
alexlovesdatayou are  right ... I could do that on Sunday23:35
@sonney2kwould be faster than preprocessing all the time23:35
alexlovesdataor try ;)23:35
@sonney2kalexlovesdata, btw where did you do the FFT?23:36
alexlovesdata+ write an example which compares against a real gaussian on small scale data23:36
@sonney2kand where is the sin() ?23:36
alexlovesdatayou need only the cos ... if I remember correctly23:37
@sonney2kint the paper above they have sin(w'x) and cos(w'x)23:38
alexlovesdataif I remember correctly there was a way to get around the sin ...23:40
alexlovesdataI will check that on sunday23:40
alexlovesdatabelow equation (2) in the paper23:42
alexlovesdataother mappings such as ...23:42
alexlovesdataalso satisfy the condition ...23:42
alexlovesdatathere is a mapping cos(wx +b)23:43
alexlovesdatathat I had implemented23:43
@sonney2ksqrt(2) cos(wx+b) ok23:44
@sonney2kalexlovesdata, I think in fast food they have RF and also random hadamart (sp?) features23:45
@sonney2kbut this stuff really only kicks ass if we have it as dot features23:45
alexlovesdatahadamard ... could be23:45
alexlovesdataI will take care of it on sunday for dot features23:45
alexlovesdataand of an example23:46
alexlovesdata... will do the example first23:46
@sonney2kbut still consider we need 1000 w's for the approximation - it might actually be very expensive compared to learning piece-wise linear functions23:46
@sonney2kbut I never did care enough to compare this23:47
alexlovesdatait blows up dimensions, though 100 was enough when i tested it23:47
alexlovesdatathats why I asked abt strategies when the features do not fit into mem23:47
@sonney2kalexlovesdata, with dot features you don't need factor 100 more memory23:48
@sonney2kbut just the SVM-w will be 100 times bigger23:48
@sonney2kthat can be very fast - since also very cache-friendly23:48
@sonney2kmaybe we can broaden the fastfood project in this respect23:52
-!- sumit [75e10b98@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]23:53
alexlovesdataI go home23:54
alexlovesdatawill continue from home :|23:54
@sonney2kalexlovesdata, see you!23:54
-!- alexlovesdata [~binder@2001:638:806:e001:eda1:1d04:97b:c9f] has left #shogun []23:55
-!- shogun-notifier- [] has quit [Quit: transmission timeout]23:58
n4nd0it got lost after the fifth hop here23:59
--- Log closed Sat Feb 16 00:00:14 2013