Open in new window / Try shogun cloud
--- Log opened Tue Feb 14 00:00:19 2012
-!- blackburn [~qdrgsm@188.168.4.209] has quit [Quit: Leaving.]02:21
-!- dfrx [~f-x@inet-hqmc07-o.oracle.com] has joined #shogun05:39
-!- n4nd0 [~n4nd0@s83-179-44-135.cust.tele2.se] has joined #shogun07:47
-!- n4nd0 [~n4nd0@s83-179-44-135.cust.tele2.se] has quit [Quit: Leaving]07:53
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun07:53
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Quit: leaving]09:27
-!- karlnapf [~heiko@host86-180-120-146.range86-180.btcentralplus.com] has joined #shogun11:43
karlnapfsonney2k, around?11:44
sonne|workkarlnapf: yes11:48
karlnapfsonne|work, hi, nice to see that you are still alive :) hope everything is going well. Do you have a minute?11:49
sonne|workwe will soon go for lunch but before that yes11:49
sonne|work(and afterwards too)11:50
karlnapfok then quick:11:50
karlnapfits currently not possible to do modelselection with custom kernels11:50
sonne|workthat is true11:50
karlnapfsince the splitting is done on the features11:50
sonne|workok11:51
karlnapfAnd I think the only way to do this is to be possible to specify indices for training11:51
karlnapf(and applying)11:51
karlnapfbut for this, the data has to stay fixed11:51
sonne|worktrue11:51
karlnapfwith the current apply() methods this does not work11:52
karlnapfsince apply() works an all features, apply(CFeatures) changes the features11:52
karlnapfsame thing for train11:52
sonne|workbut if you set the subset before it would -right?11:52
karlnapfsubset on features?11:52
sonne|workyes11:52
karlnapfbut the custom kernel has no features11:53
karlnapfThe values it returns are not based on features11:53
sonne|workit has DummyFeatures11:53
karlnapfI know but the way it returns the value does not involve these11:54
sonne|work(a NOP just returning #lhs / #rhs)11:54
karlnapfI worked around this in my repo by setting a subse not to the features but to the kernel11:54
sonne|workbut one could change this right?11:54
sonne|workargh11:54
sonne|workthe others leave just now11:54
karlnapfoh11:55
karlnapfok :)11:55
sonne|workso hope you are still around in 1 hour?11:55
karlnapfwell lets continue later then :)11:55
karlnapfprobably, but not completely sure11:55
karlnapfsee you then11:55
-!- dfrx [~f-x@inet-hqmc07-o.oracle.com] has left #shogun []11:59
-!- n4nd0 [~n4nd0@n145-p102.kthopen.kth.se] has joined #shogun12:01
-!- Netsplit *.net <-> *.split quits: @sonney2k, CIA-18, shogun-buildbot_12:32
-!- Netsplit over, joins: shogun-buildbot_, CIA-18, @sonney2k12:32
sonne|workkarlnapf: Re13:04
CIA-18shogun: Soeren Sonnenburg master * rcda7657 / src/shogun/mathematics/Math.h : minor source code beautification - http://git.io/zLy5Kw13:12
CIA-18shogun: Soeren Sonnenburg master * ra044c79 / src/shogun/lib/DataType.h : introduce clone() function to SGVector - http://git.io/tclRpQ13:12
CIA-18shogun: Soeren Sonnenburg master * r107e9f9 / (src/shogun/features/Labels.cpp src/shogun/features/Labels.h):13:12
CIA-18shogun: Remove m_num_classes from CLabels and change get_num_(unique,)classes()13:12
CIA-18shogun: This adds a more efficient mechanism to determine the number of classes.13:12
CIA-18shogun: Instead of adding labels to a set, perform a sort + unique count. This13:12
CIA-18shogun: fixes an issue when using labels for regression (and crazily slow13:12
CIA-18shogun: behaviour). - http://git.io/xR86Sw13:12
CIA-18shogun: Soeren Sonnenburg master * r7cf941e / (2 files): fix warning in multiclass accuracy - http://git.io/Lqmb0Q13:12
karlnapfsonne|work hi13:15
sonne|workkarlnapf: hi :)13:16
karlnapfso, where were we?13:16
karlnapfso your point is to modify the customKernel so that it somehow maps the subset indices of the underlying features13:17
karlnapfthen modelselection would work with it13:17
karlnapfStill it would be nice when you could tell the x-val to reuse a kernel matrix in different runs13:19
karlnapfbut the problem is that it only has a CMachine reference13:19
karlnapfso no kernels at all there.13:19
karlnapfsonne|work dont you think this might be handy? Then one wouldnt have to precompute all the matrices duing model selection by hand before but just pass a flag or so13:26
-!- n4nd0 [~n4nd0@n145-p102.kthopen.kth.se] has quit [Quit: Leaving]13:27
-!- nando [~nando@n145-p102.kthopen.kth.se] has joined #shogun13:27
-!- nando is now known as n4nd013:28
sonne|workkarlnapf: sorry got interrupted13:37
karlnapfnp13:37
karlnapfI have to leave in about 20 min, if you are busy till then lets just discuss by mail/github13:39
sonne|workkarlnapf: I guess from the user perspective it is13:39
sonne|workall a user wants is give the thing data get result as fast as possible - do whatever necessary inbetween13:40
sonne|workso this means precompute matrix / re-use kernel cache etc13:40
karlnapfyes13:41
sonne|workproblem is that it makes things very complex13:42
sonne|workkarlnapf: Don't you think that this becomes a bit too tough to control?13:42
karlnapfhow do you mean that?13:42
karlnapfI mean all that has to be done is to replace the current kernel by a custom kernel with precomputed matrix and restore afterwards13:43
karlnapfthe rest of the framework just uses the kernel as normal and does not even know thats its now a custom kernel13:43
karlnapfI mean you are right this adds complexity13:44
sonne|workyeah but what happens if the training process is interrupted?13:44
sonne|workctrl +c13:44
karlnapfand then?13:44
sonne|workthen the object has a customkernel assigned to it13:45
karlnapfOh, yes, but if this locking would be used13:45
sonne|worknot the one it had before13:45
-!- wiking [~wiking@huwico/staff/wiking] has quit [Read error: Connection reset by peer]13:45
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun13:45
karlnapfit should not be possible to do any of the normal stuff if the machine is locked13:45
karlnapfand unlocking restores the old configuration13:46
sonne|workI understand hmmhh. so lets summarize the benefits13:46
karlnapffor example train is only implemented in CMachine13:46
karlnapfthere could be a check13:46
sonne|workhmmhh and the kernel machine could do the kernel precomputing (if e.g. < 5000 examples / sufficiently many kernel matrices need computation)13:47
sonne|workotherwise attempt kernel matrix cache re-use13:47
karlnapfI dont have too much overview in the kernel cache13:49
karlnapfbut yes, the precomputation is done in KernelMachine13:49
karlnapfother machines could do different things (dont have a case in mind yet)13:49
karlnapfThis also automatically makes x-val over custom kernels possible13:50
karlnapfbecause you can simply create a custom kernel from a custom kernel13:50
sonne|workbut this requires twice the memory then right?13:51
karlnapfno just use the same13:51
sonne|workahh you get just the ptr for custom kernel right? not a copy13:51
sonne|work?13:51
karlnapfyes13:51
karlnapfBTW do you know whats this about with the float32 matrices in custom kernel?13:51
karlnapfbecause that causes problems with the above procedure13:52
sonne|workI mean get_kernel_matrix returns the copy13:52
sonne|workohh13:52
sonne|workfor efficiency13:52
karlnapfk13:52
sonne|workcustomkernel tries to save memory13:52
karlnapfwell CustomKernel would have to override the get_kernel_matrix method13:52
sonne|workthat is why float instead of double13:52
karlnapfok13:52
sonne|workso we need a copy too then :(13:52
karlnapfno13:52
karlnapfthe CustomKernel can be created from a float3213:53
sonne|workyeah but when you do get kernel matrix?13:53
sonne|workok you could get a float32 variant13:53
sonne|workshould be ok I guess13:53
karlnapfyes, also the get_kernel_matrix is only called from a CustomKernel who wants to have the same matrix13:53
karlnapfno calling of this method from outside13:53
karlnapfif it is, a copy has to be returned, but this makes not really sense13:54
karlnapfI added this method to CustomKernel13:54
karlnapfSGMatrix<float32_t> get_kernel_matrix()13:55
karlnapf{13:55
karlnapfreturn kmatrix;13:55
karlnapf}13:55
sonne|workI see13:55
karlnapfthen when the Constructor CCustomKernel(CKernel*) is used his is called13:55
sonne|workfor customkernels...13:55
karlnapfonly problem is with the free in destrucor because we dont want to second custom kernel to delete the oiginal matrix13:55
karlnapfyes, this is just to make the x-val work on custom kernels13:56
sonne|workargh yes13:56
sonne|workI wonder if it was the right decision to not have reference counts for SGVector/matrix objects13:57
sonne|workinstead of the do_free flag13:57
karlnapfyes the counting would be nice here :)13:57
sonne|workthere are other places where this is annoying13:57
karlnapfI dont like the do_free too much since I never know whether its used clean, so I tend to use the destroy_vector method and make sure that I know whats going on (where the vector comes from etc)13:58
karlnapfbut anyway13:58
karlnapfthe matrix could only be deleted if a flag is set or so13:59
sonne|workwhich means we should replace it by ref counts13:59
karlnapfWhy dont you use a matrix class btw?13:59
sonne|workso the current x-val will call CMethod.data_lock ?13:59
karlnapfno14:00
sonne|workyou mean sth else instead of SGMatrix?14:00
karlnapfyes14:00
sonne|workwho has to lock the data?14:00
karlnapftwo possibilities:14:00
sonne|workshouldn't x-val do it?14:00
karlnapfno because sometimes you want to do multiple x-vals on same kernel (for example for C)14:00
karlnapfso either the grid-search class does it for everxy parameter change14:01
sonne|workRe SGMatrix - this is only meant as a simple struct to store the data plus some minor helpers no real matrix class (we use blas etc for that underneath whereever possible)14:01
karlnapfor, the user (optionally) if he only wants to to x-val14:01
karlnapfah ok14:01
karlnapfxval just check whether data is locked and then uses the corresponding train/apply methods14:02
sonne|workbut the missing reference count is really a design flaw14:02
karlnapfyes would be nice to have but I guess this is a lot of work14:02
sonne|worknot too much but I don't have any time :(14:03
sonne|workso is it best to do it in grid search?14:03
karlnapfyes, I think so14:03
karlnapfone could even only do it if a kernel parameter has changed, but thats too complicated for now I think14:03
sonne|workI mean it would be nice to do the locking automagically (optionally forced on/off of course (some enum MODSEL_LOCK_AUTO / OFF / ON)14:04
karlnapfyes14:04
karlnapfthat would be nice14:04
karlnapffor example if train_locked is called, its locked automatically14:04
karlnapfbut the problem is14:04
karlnapfthat the unlocking has to be done by hand14:04
sonne|worknahh14:04
karlnapfbecause machine does not know when something has changed14:04
sonne|workdoesn't the grid-search on top know?14:05
karlnapfyes14:05
karlnapfit currently does a lock before evaluation and unlock after14:05
karlnapf(if the flag was set)14:05
karlnapfotherwise it just does the old way14:05
sonne|workso it should set the flag itself14:06
karlnapfyes it does14:06
karlnapfGridsearch gets a boolean wheather it should lock, then everythign is done internally14:06
karlnapfxval does not14:07
karlnapfsince user may have locked before14:07
karlnapfif you just want to evaluate a machine you lock it and then perform xval14:07
karlnapfif xval would always lock then you would have double computations14:07
karlnapfalso:14:08
karlnapfimagine you have a kernel which does not change but svm params that do and you want to do a search14:08
sonne|workcouldn't xval test if things are locked and only then lock?14:08
karlnapfthen you lock machine before and then tell grisearch to not lock14:08
karlnapfkernel is not recomputed, but machine is locked all he time so its fast14:08
karlnapfyes it could14:09
karlnapfI mean all kinds of user-friendly stuff could be added for this14:09
karlnapfif it would automatically lock if not yet locked14:09
karlnapfgridserach would just have to unlokc after each iteration14:09
sonne|workok - I would want to hide all this locking from the user14:09
sonne|workand only add some property to gridsearch/xval that one can manually set14:10
sonne|workto override our decision14:10
karlnapfyes one flag in select_model method14:10
karlnapfthe old examples still all run without anything changes (except for some signatures)14:10
karlnapfif somebody does not know about all this, everything is as it was14:11
karlnapfabout the hiding the stuff. Since the user has to understand when locking is good when its not, I like that it has to be done manually14:11
sonne|workwell we should do the best guess14:11
karlnapffor example in the case of ixed kernel search for SVM-C14:12
karlnapfThe locking would be stupid14:12
karlnapfsince kernel does not change14:12
sonne|workso for many parameter-combinations / small kernels14:12
karlnapfI mean locking in every iteration14:12
karlnapfyes14:12
sonne|workyeah but you know this in grid search14:12
karlnapfyes, so just lock mannually before once and then tell grid-search to not lock14:13
sonne|workwait - it makes sense there to precompute the kernel14:13
karlnapfyes but only once14:13
sonne|workbut only once right?14:13
karlnapfnot for every C14:13
sonne|workyeah but this we need to determine automagically somehow14:13
karlnapfhow?14:13
sonne|worklocking makes only sense if features change14:14
sonne|workso do it once for constant set of features14:14
karlnapfthe problem is that the modelselection does not distinguish between kernel parameters and machine parameters14:14
karlnapfI thought of extracting the parameter combinations where the kernel params are fixed and then only lock for these14:15
karlnapfbut then you would have to add knowledge about possible subclasses there14:15
karlnapfwhat about other machines, how to they do the locking?14:15
karlnapffor example we also got the distance matrices14:15
karlnapfThats why I prefered the manually locking before there14:16
sonne|workseems like we need another flag - parameter changes data representation14:16
sonne|workmanual locking is very tough for the user to get right14:16
sonne|workx-val/grid search is already pretty complicated14:16
sonne|workso we should hide that stuff if possible14:17
karlnapfIn the basic case he does not have to do anything14:17
karlnapfI mean for model selection14:17
karlnapfonly if he uses x-val manually14:17
karlnapfand if he wants to save more time while model selection14:17
karlnapfthe x-val manually case could be done automatically though14:17
karlnapfx-val should always (flag) try to lock if its not done yet14:18
sonne|workexactly14:18
karlnapfah, but what about the unlocking14:18
karlnapfthis cannot be done automatically for xval14:18
sonne|workunlock if it has locked it14:18
sonne|workwhy not?14:18
karlnapfBecause of this case where a grid-search was performed on locked data14:19
karlnapfthen the kernel would always be recomputed14:19
sonne|workyeah but it didn't lock it itself then14:19
karlnapfeventhough it does not change14:19
karlnapfoh yes14:19
karlnapfShould be ok then14:20
sonne|workx-val just stores that it has to unlock later14:20
CIA-18shogun: Soeren Sonnenburg master * r3649f3b / (31 files in 9 dirs):14:20
CIA-18shogun: Merge pull request #367 from karlnapf/master14:20
CIA-18shogun: A draft for training on fixed kernel matrices/data in general - http://git.io/qv89yQ14:20
karlnapfOk I will change the stuff we talked about soon14:21
sonne|workwe don't have to work it out all at once - step by step is ok.14:21
karlnapfyes14:21
karlnapfI have locally added parallelization of apply_locked14:21
karlnapfin KernelMachine14:22
sonne|workI would love to have this much easier nested list way of specifying how grid search has to be done (from the python side)14:22
karlnapfyes14:22
karlnapfscicit is nice there14:22
wikingsonne|work:  just sent you my benchmark results... if you have any better idea how to do the benchmarking let me know...14:22
sonne|workyes - not as powerful but nice14:22
karlnapfbut the framework itself is not so nice as ours :)14:22
karlnapfyes,14:22
karlnapfbut this could be done for us14:23
sonne|workand after all the most common case is what counts14:23
karlnapfSay another thing14:23
sonne|workso if we make it simple for the most common cases14:23
sonne|workthings would be good enough14:23
karlnapfI try to use shogun for university projects, and I really run across a lot of bugs/segfaults/no error messages14:23
karlnapffrom python14:23
karlnapffor example in MultiClassSVM I found code which never was wrong but not niticed since it never was used before14:24
karlnapfso I thought of what about separating the examples and tests14:24
karlnapfand add some more tests which try to cover all the code14:24
karlnapfat least for new stuff14:24
sonne|workwiking: why don't you just compute a kernel matrix that has say 10000 * 10000 elements - single threaded maybe. then you could directly measure.14:24
karlnapf?14:25
sonne|workkarlnapf: I used multiclasssvm just fine?!? and so did blackburn14:25
sonne|workwhat happened?14:25
karlnapfuse apply(int32_t)14:25
sonne|workkarlnapf: we don't have resources in splitting up examples / tests14:25
sonne|worktoo much work14:25
wikingsonne|work: this seemed to be much faster as this is really just about that function and it's implementation w/o any wrapping stuff...14:26
sonne|workwe could have more examples and enable the the tests14:26
karlnapftwo problems: a) does not set the kernel to svm before applying b) does not initialise the votes vector with zeros so if one class gets zero votes you return uninitialized emory14:26
sonne|workkarlnapf: I never used apply(int32_t) (neither blackburn :)14:26
sonne|workpretty simple explanation :)14:26
karlnapfthats what I meant with code coverage tests :)14:27
karlnapfanyway, I have to go now14:27
karlnapfnice discussion, see you later! :)14:27
sonne|workkarlnapf: well if we had an example for apply(int)14:27
sonne|workit would have shown the bug in the first place14:27
sonne|workbut we don't...14:27
karlnapfyes, I will add one :)14:27
sonne|workthx14:27
sonne|workcu14:27
karlnapfThere are more bugs in MultiClassSVM I am just fixing them cause I need to use it for uni :)14:27
karlnapfbye bye!14:28
sonne|workkarlnapf: that is how it should be14:28
sonne|workI am doing the same in code you and others touched :D14:29
karlnapfok, thats opensource then :D14:30
-!- n4nd0 [~nando@n145-p102.kthopen.kth.se] has quit [Quit: leaving]14:42
-!- karlnapf [~heiko@host86-180-120-146.range86-180.btcentralplus.com] has left #shogun []14:52
-!- n4nd0 [82ede32a@gateway/web/freenode/ip.130.237.227.42] has joined #shogun16:12
-!- n4nd0 [82ede32a@gateway/web/freenode/ip.130.237.227.42] has quit [Quit: Page closed]16:58
CIA-18shogun: Soeren Sonnenburg master * r3e3db13 / (33 files in 14 dirs):17:38
CIA-18shogun: add linear least squares and ridge regression17:38
CIA-18shogun: This add linear ridge regression and a convenience class17:38
CIA-18shogun: CLeastSquaresRegression calling CLinearRidgeRegression with17:38
CIA-18shogun: regularization parameter tau=0. To not cause confusion KRR is17:38
CIA-18shogun: renamed to KernelRidgeRegression throughout examples/code. - http://git.io/pQJ0OA17:38
sonne|workwiking: https://gist.github.com/182807217:53
wikingecheckoing17:53
wikingok so checking :)17:53
sonne|workwiking: there really is no difference sometimes one wins sometimes the other bu really a few s17:53
sonne|workwiking: btw you need to use gettimeofday to get highres timings17:53
wikingheheheh17:55
wikingso tie? :)17:55
wikingnone of us gets a drink? :P17:55
sonne|workJS1: 134.680303617:55
sonne|workJS2: 134.787611717:55
sonne|workhere17:55
wikinganyhow i was meaning to ask if you guys would be interested still in latent-structural svm implementation..17:55
sonne|workbut when I run it a couple of times it might very well be vise versa17:55
wikingwell i guess that's when the pipeline kicks in17:57
sonne|workwiking: problem might be that alex has no time to mentor...17:57
sonne|workI need to work on that gsoc ideas list (have already a couple of submissions from mentors) ... structured output learning / multiclass / optimization framework would be what I would want to see this year but I am not supermentoring this year :)17:59
sonne|workanyway gtg17:59
sonne|workcu17:59
wikingca18:00
wikingcya18:00
-!- Netsplit *.net <-> *.split quits: @sonney2k, shogun-buildbot_, CIA-1818:55
-!- Netsplit over, joins: CIA-18, @sonney2k18:56
-!- shogun-t1olbox [~shogun@7nn.de] has quit [Ping timeout: 260 seconds]19:09
--- Log closed Tue Feb 14 19:09:54 2012
--- Log opened Tue Feb 14 19:10:01 2012
-!- shogun-toolbox [~shogun@7nn.de] has joined #shogun19:10
-!- Irssi: #shogun: Total of 8 nicks [1 ops, 0 halfops, 0 voices, 7 normal]19:10
-!- Irssi: Join to #shogun was synced in 7 secs19:10
-!- Netsplit *.net <-> *.split quits: shogun-t1olbox19:15
-!- blackburn1 [~qdrgsm@188.168.4.71] has joined #shogun19:23
-!- Netsplit *.net <-> *.split quits: blackburn19:27
-!- blackburn1 [~qdrgsm@188.168.4.71] has quit [Ping timeout: 248 seconds]19:44
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun20:22
n4nd0hi there20:37
n4nd0so, is it gsoc a topic to start talking about?20:37
n4nd0any topics that will be interesting this year? I have checked the ideas on the webpage for gsoc 201120:39
shogun-buildbot_build #149 of nightly_none is complete: Failure [failed compile]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_none/builds/14921:05
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 240 seconds]21:08
shogun-buildbot_build #135 of nightly_default is complete: Failure [failed compile]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_default/builds/13521:15
shogun-buildbot_build #148 of nightly_all is complete: Failure [failed compile]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_all/builds/14821:26
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]21:42
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun21:51
-!- nando [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun22:11
-!- nando is now known as n4nd022:11
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]23:33
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun23:35
-!- blackburn [~qdrgsm@188.168.4.152] has joined #shogun23:36
--- Log closed Wed Feb 15 00:00:19 2012