n4nd0 blackburn blackburn --- Log opened Mon Mar 05 00:00:19 2012 in high school people that go into letters may study Latin and/or Greek 00:00 I knew about 200-300 words I guess 00:01 it impacts sometimes when you understand a word you have never seen because it is similar to some latin 00:02 yeah 00:02 I had a teacher in Swedish who had some knowledge in Spanish and Italian 00:03 not enough for a fluent conversation but he could read quite a bit 00:03 he claimed that he never studied those, just Latin 00:03 really curious! 00:04 like meta-language 00:04 :) 00:04 uh 3 am 00:05 I guess I have to sleep a little :) 00:05 oh that's late 00:05 "just" 12 here 00:05 good night then 00:06 I wish it was 12 here 00:06 :) 00:06 good night 00:06 -!- blackburn [~qdrgsm@31.28.32.139] has quit [Quit: Leaving.] 00:06 -!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 276 seconds] 01:08 -!- axitkhurana [~akshit@14.98.227.233] has joined #shogun 01:47 -!- axitkhurana [~akshit@14.98.227.233] has left #shogun [] 01:47 -!- vikram360 [~vikram360@117.192.171.117] has quit [Read error: Connection reset by peer] 02:12 -!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun 08:29 shogun: Soeren Sonnenburg master * rd3f6438 / (4 files in 2 dirs): 09:07 shogun: Mahalanobis distance fixes 09:07 shogun: - use mean of all examples 09:07 shogun: - improve documentation 09:07 shogun: - serialization support - http://git.io/0kJS3w 09:07 -!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun 09:10 n4nd0: please have a look at my mahalanobis commit 09:11 this is what I meant - but I didn't have time to check it thoroughly would be great if you could do it 09:11 thanks! 09:11 sonne|work: sure I will check it, give me some minutes 09:20 n4nd0: you basically did it like I had in mind but missed to compute the mean over both lhs/rhs and some minor issues (serialization / documentation) 09:21 sonne|work: I will take it a look so I can do it better next time 09:22 -!- wiking [~wiking@huwico/staff/wiking] has joined #shogun 09:23 n4nd0: keep in mind that not everything I do is correct so have a critical eye on it - I am open for discussion :) 09:23 -!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking] 09:30 -!- wiking [~wiking@huwico/staff/wiking] has joined #shogun 09:30 -!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 260 seconds] 09:35 sonne|work: ups! is the current build working good? I just pulled and compiled but the linker is complaining in lot of points 10:26 multiple definition of lot of methods in shogun::MulticlassMachine 10:27 n4nd0: yes - do a git clean -dfx  to erase all files not in the repository (warning...) 10:28 I deleted .o files in multiclass and it worked 10:29 thank you :) 10:29 -!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has joined #shogun 10:34 sonne|work: so one thing in your commit is the use of (l == r) 10:35 n4nd0: yeah that's sufficient 10:36 sonne|work: that makes that they are considered different even if they have the same values but MahalanobisDistance is instantiated with different CSimpleFeatures 10:36 sonne|work: ah ok, no problem then 10:36 sonne|work: I answered ;) 10:38 sonne|work: about ocas - it is working 10:39 you fixed it? 10:39 blackburn: ^ 10:39 nope, but for simple examples it was ok 10:40 I have to check my code 10:40 sonne|work: well test we have says it is ok 10:40 from tester_all I mean 10:40 I had another report from sb else who also complained that it didn't work 10:40 blackburn: our oversight then 10:41 probably, I'll check later 10:41 sonne|work: about mc-liblinear - yes it works 10:43 I even got better results on my data over simple OvR liblinear 10:43 so 97 again now? 10:43 sonne|work: 96.8 but I didn't do model selection very well :) 10:46 pretty good anyway 10:46 such exact homogeneous map works well and I like it pretty much :) much better to use linear spaces 10:47 sonne|work: I tested the results, they are right 10:52 sonne|work: it actually makes sense using the whole data when l != r and take the mean over both distributions, sorry I didn't get you :S 10:54 blackburn: yeah it is really fast later on! 10:54 n4nd0: yeah - I thought it is the same like cov / one should use lhs and rhs if available for mean too 10:55 sonne|work: btw, I've added rejection strategies class 10:55 I can't mind any not threshold based rejection strategy but it is ok to keep it modular I think 10:56 add an example to show how it works... 10:56 yeah gradually I'll do, just some rush 10:56 sonne|work: rejects are particularly important for me (e.g. actual accuracy can be measured w/o rejects and it should be ~1.0) 10:58 I have seen some SVMs that trains with reject option, but it would take time to implement it.. 10:58 it is unclear though if you can gain a lot using this. I would suspect your simple thresholding works good enough for most cases :) 11:06 sonne|work: maybe some assumption that trainset should have rejected vectors that should not turn hyperplane round ;) 11:09 yeah but you can control that already by giving different Cs to examples 11:10 true 11:11 of course you would need to know which examples could be problematic 11:11 probably the ones misclassified in a previous run :D 11:12 sonne|work: I had some idea (unrelated to classification) - can you imagine some python object that delegates some ops to lambdas? 11:12 some example: 11:12 PythonFeatures with get_feature_vector implemented in python 11:13 I did not get *any* idea how to get it done.. 11:13 ? 11:13 sonne|work: imagine Features instance with set get_feature_vector/get_dim_feature_space/etc to lambda 11:14 I think it is impossible.. 11:14 I mean it could be custom then 11:14 to lambda? 11:14 yeah to functions 11:15 I don't understand what you want to say? 11:15 e.g. get_feature_vector = lambda x: some-sql-select 11:15 autogenerated features? 11:15 not, custom 11:15 formulas 11:15 where you can set operations 11:15 custom!?! 11:15 like you provide some python script? 11:15 yes-yes 11:16 that's easy 11:16 how? 11:16 just overload the get_feature_vector functions etc 11:16 (from python) 11:16 really? 11:16 will it work?? 11:16 for this to work you have to enable directors for swig though 11:16 do you find it useful? I do.. 11:16 well I accidentally did that in the first swig based releases 11:16 things become very slow then 11:17 that's bad 11:17 so I would rather want a separate class just for that 11:17 then only this class gets director functionality 11:17 and get/set * can be overriden from $LANG 11:17 damn I thought it is impossible 11:17 welcome to swig 11:18 sonne|work: another issue (have you 2 mins more?) 11:18 you can overload a C++ method from$LANG 11:18 no 11:18 bad, ok then later 11:18 hmm nevermind, useless suggestion (I thought of integrating lapack to shogun code) 11:19 blackburn: hey there! hope you are not too angry after the results in the elections 11:30 blackburn: I wanted to ask you one thing about QDA 11:33 blackburn: LDA is shogun is implemented regularized so I suppose that we are interested in regularized QDA right? 11:33 n4nd0: not angry at all - let this people live with this guy ;) 11:35 n4nd0: is regularization there some X+delta I? 11:35 blackburn: do you mean in QDA or LDA? 11:36 both? :) 11:37 I just don't know what is the regularization there 11:37 as for your question - I just meant that it would possibly be pretty easy to make it regularized 11:37 or not? 11:37 I am not really sure right now 11:38 I am still reading documentation about it 11:38 but it seems to me that the method changes more than just a little when regularization is used 11:39 really? 11:41 n4nd0: I think the easiest way is to implement it just as it is in scikits ;) 11:42 blackburn: haha ok 11:42 I took a look there 11:43 but I didn't find documentation about how they do it 11:44 there is an example showing a couple of plots, and the code of course 11:44 looks pretty straightforward.. 11:49 what makes you unhappy? ;) 11:49 nothing :P 11:53 -!- sonne|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds] 11:53 oh we lost colonel sonnenburg 11:54 :D 11:54 n4nd0: http://s1-05.twitpicproxy.com/photos/large/531249569.png?key=890213 11:57 it is for real ;) 11:59 oh 12:00 I saw some percentages but they were not that high 12:01 I saw something like 60 something for Putin over 70 total votations 12:01 ah it is in chechnya 12:01 local region 12:01 haha it is big local region 12:02 it could almost be capital in Sweden in terms of population 12:02 small republic 12:02 I am guessing those numbers in black are # voters 12:03 yes 12:03 ah fuck I did't recognize the name at first sight 12:03 I recognize it as "Chechenia" 12:03 it is how we pronounce it in Spanish 12:03 there was a war as you may probably know :) 12:04 yeah, that's why I remember the name 12:05 it appeared a lot in the news 12:05 -!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun 12:09 -!- vikram360 [~vikram360@117.192.190.106] has joined #shogun 12:16 blackburn : and putin wins 12:16 surprise? 12:17 :) 12:17 -!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 252 seconds] 12:31 nope.. but the media seems to be having a field day. 3000 official complaints with the voting. 12:35 blackburn: isn't QDA the same as LDA but just on quadratic features? 13:17 sonne|work: what is quadratic features? 13:19 all monomials of degree 2 13:19 you probably know better? ;) 13:20 x_1*x_2 x_1^2 x_2^2 13:20 for 2d input vectors 13:20 sonne|work: well we have no such features? 13:20 polynomialdotfeatures? 13:20 or sth? 13:20 PolyFeatures 13:21 ah 13:21 sonne|work: well I don't know then, do you think QDA is useless? 13:21 anyway it makes sense to make things explicit, i.e., if it is the same use LDA on simplefeatures? 13:22 err imple QDA on simplefeatures by using PolyFeatures internally 13:23 yeah i got it 13:23 -!- vikram360 [~vikram360@117.192.190.106] has quit [Read error: Connection reset by peer] 13:24 -!- sonne|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds] 13:30 -!- wiking [~wiking@huwico/staff/wiking] has joined #shogun 13:38 -!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun 13:45 sonne|work: it took one year for me to finally understand how features are working here in shogun hahahah 13:47 blackburn: anyway better check that LDA on squared features is QDA - could be that there is sth else to it :) 14:04 sonne|work: n4ndo will do probably ;) 14:04 sonne|work: I have seen interesting thing in your talk 14:04 optimizing svm with auprc 14:04 did you try to train svm this way? 14:06 blackburn: doesn't help look at t joachims paper (best paper award ICML) - gives you like 0.00000001% :) 14:06 blackburn: which talk? 14:06 sonne|work: http://sonnenburgs.de/soeren/talks/2006-05-02-perf-measures.pdf 14:06 ohh that crap 14:07 probably all wrong 14:07 ah I see 14:07 :D 14:07 I guess best is to look at this one page in my thesis - there are all the perf measures I know of (and in shogun) in there 14:07 I was interested in svm on last page 14:08 sonne|work: I can't understand why in mc svms there are (+b_n) - (+b_m) < 1 - \xi_m 14:08 yeah it is some paper by Thorsten Joachims doing that fast but it doesn't help 14:08 why 1? :) 14:08 margin fixed to 1 14:08 like in svm 14:08 I am thinking about ECOC training of svm 14:09 like in mc-svm like in structured output leraning 14:09 and don't know how to formulate this boundary 14:09 something makes me think there won't be 1 :) 14:09 in words: f(good_x) - f(other_x) > 1- sth 14:10 anyway back to work 14:10 it looks like you work in iron mine 14:10 :) 14:11 -!- faktotum [~cronor@fb.ml.tu-berlin.de] has joined #shogun 14:45 Hello! 14:46 hi 14:46 is there the possibility to set a custom sparse kernel? 14:46 i know there are sparse kernels and that you can set custom kernels. but how do you set custom sparse kernels? 14:46 i'm using python module if that is of interest 14:46 I see.. I guess it is not yet implemented 14:46 but I think it is pretty straightforward to implement 14:47 my current workaround is to do a cholasky K = LL* decomposition and then use L as a sparse feature vector but that is not tractable with bigger matrices 14:47 I am not sure I understood why do you do cholesky 14:48 faktotum: sounds like an easy task to add - patches welcome :) 14:48 i will try it tonight 14:48 blackburn: if i have K = LL* i can set my sparse features to L and then use a linear kernel. then i would end up with K as a custom sparse kernel 14:50 whoa I see 14:50 faktotum: depending on how sparse things are you could just use SGSparseMatrix 14:51 but it is not fast enough I guess.. 14:51 oh that was my idea, why is it not fast enough? 14:53 faktotum: probably it would be slow in means of checking if k_i,j is zero 15:02 ok, but doesn't the kernel created from sparse real features have the same problem? 15:04 it has as well.. 15:05 I guess some hash map should be here 15:05 faktotum: anyway cholesky of some say 4000x4000 matrix is pretty slow ;) 15:17 ha! don't try 15k x 15k! 15:23 15k x 15k?! 15:23 that probably takes a lot of memory :) 15:24 chompack has a sparse cholesky decomposition implemented 15:24 ah 15:24 I guess dense 15K would never finish 15:25 -!- vikram360 [~vikram360@117.192.190.106] has joined #shogun 16:35 -!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has quit [Quit: Page closed] 16:38 faktotum: maybe it is good enough: basically finding the kernel row is fast but not finding the column 16:43 if it is really sparse some kind of hasmap of tuples or whatever could be faster... 16:44 but a lot of overhead then 16:44 faktotum: so please go ahead with the sparse matrix idea - should do the job 16:49 -!- in3xes [~in3xes@180.149.49.227] has joined #shogun 16:50 -!- cronor [~cronor@141.23.80.206] has joined #shogun 16:52 -!- faktotum [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds] 16:56 -!- cronor [~cronor@141.23.80.206] has quit [Remote host closed the connection] 17:00 -!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun 17:00 -!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Quit: cronor] 17:07 -!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun 17:17 -!- in3xes [~in3xes@180.149.49.227] has quit [Quit: Leaving] 17:19 -!- cronor_ [~cronor@141.23.80.206] has joined #shogun 17:21 -!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds] 17:23 -!- cronor_ is now known as cronor 17:23 -!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun 17:28 -!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] 17:50 -!- wiking [~wiking@huwico/staff/wiking] has joined #shogun 17:50 I know this is probably a n00b question but there seems to be very little information about it: In what way is the C5.0 algorithm better than the C4.5? 17:53 -!- wiking_ [~wiking@huwico/staff/wiking] has joined #shogun 18:01 -!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 244 seconds] 18:02 -!- wiking_ is now known as wiking 18:02 -!- cronor [~cronor@141.23.80.206] has quit [Quit: cronor] 18:52 -!- axitkhurana [~akshit@14.98.55.250] has joined #shogun 19:29 -!- axitkhurana [~akshit@14.98.55.250] has left #shogun [] 19:29 -!- blackburn [~qdrgsm@188.168.4.3] has joined #shogun 19:47 -!- blackburn [~qdrgsm@188.168.4.3] has quit [Quit: Leaving.] 20:12 vikram360, it is not clear to me either - all I know is that there were papers showing that it is better... 22:46 there just was no open source impl. of c5.0 around 22:46 and for c4.5 only some free for acadamic use thingy 22:47 so people tried c4.5 if they could but that's it 22:48 ahh btw weka has a java version of c4.5 (iirc called j45) that has probably much more clean code 22:48 sonney2k: hey! I read before you talked with blackburn about QDA 23:05 sonney2k: I have been reading into it so I could implement it in shogun 23:06 sonney2k: but I am not really sure if I relate what I have read about it what with that you said before 23:06 sonney2k: so it seems that QDA and LDA are similar in that they assume that the feature vectors follow a normal distribution, but LDA assumes that the distributions for all the classes have the same covariances while QDA doesn't make that assumption 23:08 sonney2k: is that right this far? 23:08 I guess so - at least LDA when cov matrices are considered the same the problem becomes linear 23:17 sonney2k: ok, so I understand that 23:21 sonney2k: but is it then equivalent to use LDA using polynomial features? 23:21 I mean, can we just make polynomial features from the original ones (e.g. if we have at the beginning x1 and x2, we expand the feature vectors so they also contain x1?, x2? and x1?x2) 23:23 would solving that with LDA be equivalent to QDA? 23:23 n4nd0, it must very close but I am not sure if it is exactly the same 23:26 best description about LDA/QDA I found is https://onlinecourses.science.psu.edu/stat857/book/export/html/17 23:27 sonney2k: cool, thank you very much, I was using this reference http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-4389.pdf 23:31 I have some trouble when it gets into the regularization part 23:32 hmmh, seems like QDA / LDA results on quad features differ but it is always mentioned that one can use it to get quadratic classifier ... 23:34 so do you think it would be interesting to add QDA in shogun? and if so how? 23:39 something similar to LDA that is already implemented using regularization? 23:39 -!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking] 23:42 --- Log closed Tue Mar 06 00:00:19 2012