Open in new window / Try shogun cloud
--- Log opened Sat Sep 24 00:00:21 2011
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has quit [Ping timeout: 252 seconds]00:07
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has joined #shogun00:15
-!- blackburn [~blackburn@31.28.44.65] has joined #shogun12:13
blackburnsonney2k: can it be wrong includes or so?12:39
blackburnabout arpack dsymv12:39
@sonney2kblackburn, I think we should first check if his LDA really works19:30
@sonney2kand what his configure output is19:30
blackburnsonney2k: he tested LDA and all good19:32
@sonney2khe didn't say that19:32
blackburnwith example from python_modular19:32
blackburnhe told me in private haha19:32
blackburnThe lda example (shogun/examples/documented/python_modular/classifier_lda_modular.py) seems to work. I had to change the data path to add toy.19:32
blackburnI.e. ../data/toy/fm_train_real.dat etc19:32
blackburnsonney2k: the only difference I noticed - I was including cblas in arpack.h and arpack.cpp19:34
blackburnwhile it is included in lapack.h19:36
blackburnit is not necessary19:36
CIA-3shogun: Sergey Lisitsyn master * r826ff44 / (2 files): Removed unnecessary includes in arpack.{cpp,h} - http://git.io/36Q4Cg19:36
blackburnand removed in commit ^19:36
@sonney2kthat all doesn't make sense19:52
@sonney2kI will ask cheng again19:52
CIA-3shogun: Soeren Sonnenburg master * rbe97a43 / (3 files in 3 dirs): fix mixup of epsilon / tube epsilon in libsvr examples - http://git.io/Ty7s-A19:57
blackburnsonney2k: we've got at least one commit per day during Sep 16 - today19:59
@sonney2kand we should keep it like this20:00
blackburnpace many libs don't have :)20:00
blackburnsonney2k: what is the tube epsilon?20:05
@sonney2kthe epsilon tube for the epsilon insensitve loss in support vectore regression20:05
blackburnoookk20:06
blackburnnew algo to go into shogun in 10 minutes!20:25
blackburn:D20:25
CIA-3shogun: Sergey Lisitsyn master * r3308301 / (6 files in 3 dirs): Introduced DiffusionMaps dimension reduction preprocessor - http://git.io/5M4CKw20:45
blackburnsonney2k: vodka!20:46
@sonney2kheh20:46
@sonney2kblackburn, btw which dim red algo would you recommend to visualize data :)20:46
blackburnsonney2k: depends what is the data20:47
@sonney2ksome real-valued inputs20:47
@sonney2knot much more prior knowledge really - hard to classify20:47
blackburnif there could be any underlying manifold - you can try LTSA20:48
blackburnit is pretty fast and robust20:48
blackburnwell but MDS/Isomap would be useful too20:48
@sonney2kany thoughts on PCA / kPCA - shouldn't I do these first?20:49
blackburnwhy not :)20:49
blackburnand we have kernel PCA, kernel LLE, kernel diffusion maps20:50
@sonney2kthx20:51
@sonney2kblackburn, btw recently someone here on IRC asked me about what our feature roadmap for shogun is20:51
@sonney2kI am running a bit out of ideas what we want to focus on20:51
blackburnI'm still focused on dim reduction20:51
blackburnno idea what is your focused on :)20:51
@sonney2kshogun does a  lot of stuff nowadays - so it is really not clear to me how to really improve it20:52
@sonney2ksome more dim red methods are not really 'the big picture'20:52
@sonney2kand I also only have small things, like parallelize more code, cleanups, model selection w/ nice syntax etc20:52
blackburnsonney2k: I don't know about my plans on the long run20:53
blackburnsonney2k: http://www.kongregate.com/games/banthar/hell-tetris20:54
@sonney2kblackburn, for example features like massive parallel or mpi or neural networks or gaussian processes or wahtever20:56
@sonney2kblackburn, ^game is this even possible20:56
blackburnsonney2k: oh I hate neural networks :D20:57
blackburnno idea if it is possible20:57
blackburnI'm afraid I can't plan such a grand new features20:57
@sonney2kblackburn, then it is not likely that you impl. them :D20:57
@sonney2kfor gsoc next year (if we want to participate) we need to20:57
blackburngaussian processes is something chris like very much :)20:58
blackburnsonney2k: I was thinking about high performance computing things but I don't know if it is even possible to do20:59
blackburnwithout serious architecture changes, etc21:00
@sonney2kblackburn, the problem with GPs is that it needs some matrix inverse (so the standard alg's are n^3)21:00
@sonney2kand you need s.o. really deep into it21:00
@sonney2k(I am not)21:00
blackburnsonney2k: most of dimreduction algos are n^3 :)21:01
@sonney2khpc stuff is sth you cannot do in general for all of shogun21:01
@sonney2kso this is sth special for certain algos21:01
@sonney2kblackburn, thinking about it - I guess I would most prefer to develop shogun in two ways21:02
@sonney2k1) large scale / hpc stuff of whatever kind21:02
@sonney2k2) breadth - many ml baseline algorithms (not necessarily fast)21:02
@sonney2kso one always has some baseline to play with21:03
blackburn1) is preferrable for me but21:03
@sonney2klets call it the 'hammer'21:03
@sonney2kand then if one knows what the baseline is can do 1) stuff on top of it21:03
blackburnI guess not MPI, but OpenCL, etc21:04
blackburnsonney2k: I've lost an idea. what to call 'hammer'?21:04
@sonney2kI wouldn't want to get each and every algorithm in there - but maybe only the most successful ones21:04
@sonney2khammer == 2)21:04
@sonney2kblackburn, one gsoc project could be some kind of opencv interfacing21:05
blackburnopencv?21:05
@sonney2kthe big computer vision lib21:05
blackburnI know21:05
blackburnbut surprised21:05
@sonney2kwhy?21:05
blackburnthey have they own impls21:05
blackburnof some algos21:05
@sonney2kone could get features from opencv and do some training on top of it21:05
@sonney2kwith some nice example21:06
blackburnsonney2k: I guess it would be better to have shogun in opencv21:06
blackburnyou could ask opencv guys about it21:07
@sonney2kblackburn ?21:07
@sonney2kI don't understand21:07
@sonney2kshogun in opencv?21:07
@sonney2kwhat does that mean21:07
blackburnI mean it would be nice to become a machine learning library for opencv21:07
@sonney2kI think you can already use it for that purpose21:08
blackburnsonney2k: but have a nice interface in opencv to shogun would be nice21:09
blackburnin more transparent way or so21:09
@sonney2kI have no idea what that could be - I mean opencv produces any kind of feature representation one can think of21:10
@sonney2kso using that representation + some shogun algo would work already21:10
blackburnsonney2k: that would be nice to treat OpenCV images as features of shogun somehow21:11
blackburnno idea about specific ways to do it21:12
@sonney2kblackburn, well I will meet gary at the gsoc mentors meeting - I will ask him21:12
blackburnsonney2k: will you go to mentors meeting?21:12
@sonney2kyes21:12
blackburnnice21:12
blackburnchris too?21:12
@sonney2kyes us two21:12
blackburnI see21:13
blackburnsonney2k: will normalization of this kind:21:13
blackburn    X = X - min(X(:));21:13
blackburn    X = X / max(X(:));21:13
blackburnchange the kernel matrix?21:13
blackburn*gaussian kernel21:13
@sonney2kyes21:13
blackburnhow significant?21:14
@sonney2kwait first one not21:14
@sonney2ktranslation invaraint but not scale21:14
@sonney2kbecause you need to rescale kernel width21:14
blackburnaha, I see21:14
blackburnI guess it would be better to normalize features too21:15
@sonney2kblackburn, so anyone else I should talk to at the mentors meeting?21:17
@sonney2kthe orange guys maybe?21:18
@sonney2ksuch that they could somehow reuse what we provide in shogun in their guy?21:18
@sonney2kgui21:18
blackburnsonney2k: well sure, ask if they want to collaborate21:19
@sonney2kanyone else you could think of?21:19
blackburnno idea21:19
blackburnsonney2k: and no idea how to collaborate with scikits guys21:21
@sonney2kblackburn, well I have one idea - we could provide some interface functions if it helps them to use our methods21:22
@sonney2kso only interfacing21:22
@sonney2kno more21:22
blackburnsonney2k: they have as much as we have21:22
@sonney2kthey have some other things21:23
@sonney2ktoo21:23
blackburne.g. they have gaussian processes21:23
@sonney2kbut not really large scale and only python21:23
@sonney2kyes21:23
blackburnI think it is not the way they would like21:23
blackburnit will make things more complex21:23
blackburnwhile orange core is developed in C++ it would be useful to have some bridge21:24
@sonney2kyeah, they have different focus21:24
blackburnI will take a look how they do the decomposition21:24
blackburnaha I see21:25
@sonney2kI think we should implement the array interface for shogun features http://docs.scipy.org/doc/numpy/reference/arrays.interface.html21:25
blackburnsonney2k: agree21:25
blackburnsonney2k: okay you definitely could ask orange if they want to collaborate21:26
blackburnthey have svms and some classifiers21:26
blackburnbut at least they haven't any of fastest C++ dim reduction preprocessors :D21:27
@sonney2kblackburn, btw one nice addition would be boosting algorithms21:28
@sonney2khttp://mloss.org/software/view/246/21:28
blackburnsonney2k: how can we integrate that?21:28
blackburnwith code or interfacing?21:28
@sonney2kuse their code21:29
@sonney2kand modify it for our purposes21:29
blackburnis it ok?21:29
@sonney2kwhy not?21:30
@sonney2kyou can always do that21:30
@sonney2kit is open source21:30
@sonney2kand gpl21:30
blackburnsonney2k: I don't know, asking21:30
@sonney2kblackburn, anyone can use code from shogun for their purpose and release the software under gpl terms21:32
blackburnsonney2k: I know but I don't like this way of develop21:33
blackburnI know it is not possible to interface to any library we want to have to21:33
blackburnbut simply don't like :)21:33
@sonney2kblackburn, true - but I spend like a month or so discussing with the MB guys and we simply have different ideas of how things should work21:34
blackburnI see21:34
@sonney2kI even merged multiboost at some stage21:34
@sonney2kwrote swig wrappers etc21:34
@sonney2kbut they do a lot of things very differently and the project is big too21:35
@sonney2kso in the end I gave up21:35
@sonney2k(due to lack of time to pursue this way to involved endeavor)21:36
blackburnbad21:36
@sonney2knot bad good21:42
@sonney2k!21:42
blackburn:)21:42
@sonney2kthis way we made some progress instead of wasting time for endless communication :)21:43
blackburnsonney2k: do you know what is faster: computing svd of A OR AA' and computing of eigenvectors?21:44
@sonney2kno idea21:45
@sonney2kwhat is the complexity of svd / eig ?21:45
blackburndon't know, I'm worried about AA' step21:46
blackburnit is n^321:46
blackburnbut SVD for 3000x3000 took 236s21:46
blackburntoo bad for shogun :)21:46
@sonney2kblackburn, if we had numpy array interface compatibility one could do things like21:50
@sonney2kx=RealFeatures(sth)21:50
@sonney2kx+=321:50
blackburnsonney2k: fantastic21:50
blackburnwe definetely should have it21:50
@sonney2kand even any normal numpy operations21:50
@sonney2kit is very easy to do21:50
@sonney2kwe only need to prove a dict called __array_interface__21:51
@sonney2kwith these fields filled21:51
@sonney2khttp://docs.scipy.org/doc/numpy/reference/arrays.interface.html#__array_interface__21:51
@sonney2kI guess that is what is needed to directly work with scikits.learn21:54
blackburnhow?21:55
blackburnoops..21:57
blackburnsonney2k: would you mind to place dimreduction techniques in another folder/module?21:58
@sonney2kwhich and do would it communicate with preprocessors?21:59
blackburnI had some idea but I forgot22:00
blackburnah22:00
blackburnthere could be a Machine for this purposes22:00
blackburnand some Preprocessor proxy22:00
* sonney2k starts to implement the array interface22:00
blackburnI don't know if it is better hmm22:00
serialhexblackburn: drive by raspberry!!! :P22:45
blackburnserialhex: hi22:46
blackburn:)22:46
serialhexhow are you??22:46
blackburnfine22:52
blackburnand you?22:52
blackburnshit, still slow23:06
blackburnsonney2k: AA' + eigenvectors is faster in practice23:10
@sonney2kok23:11
blackburn~260s vs ~60s23:11
CIA-3shogun: Sergey Lisitsyn master * r544c920 / src/shogun/preprocessor/DiffusionMaps.cpp : Improved performance and fixed Diffusion Maps - http://git.io/W3e1wg23:12
blackburnsonney2k: one unresolved problem still: how to use preprocessors that are possible to apply both to strings and simplefeatures23:16
blackburnin kernel LLE I did it with returning new feature matrix if given features arenot simple23:17
@sonney2kthat is not what preprocs where intended for23:23
blackburnsonney2k: I know, but it is really useful when embedding strings into euclidean space23:24
@sonney2ka workaround would be to introduce obtain_from_functions that do get type X as argument and return type Y23:24
blackburnehh?23:24
@sonney2kbtw, the array interface should no longer be used but instead http://docs.python.org/dev/c-api/buffer.html#Py_buffer23:25
@sonney2kCSimpleFeatures<float64_t> obtain_from_string(CStringFeatures<char>)23:26
@sonney2kor so23:26
@sonney2kor instead of CStringFeatures* just CKernel23:26
blackburnapply_to_string_features?23:26
blackburnno, kernel is worse23:26
@sonney2knot necessarily23:27
blackburnbecause every dimreduction preprocessor have its own kernel23:27
@sonney2kit encapsulates the feature type23:27
@sonney2kfor example for kCPA it would be ok23:27
@sonney2kkPCA23:27
blackburnthere is already apply_to_string_features for kPCA23:27
@sonney2kyeah but there is not apply_to_sparse_features and not apply_to_whatever23:29
blackburnhmm yes23:29
blackburnobtain_features23:29
@sonney2k_from_generic_kernel23:29
blackburnI would even add this method to dimreductionpreprocessor interface23:29
@sonney2kI don't know but I need to sleep now23:30
@sonney2kcu23:30
blackburnsee you23:30
--- Log closed Sun Sep 25 00:00:25 2011