Open in new window / Try shogun cloud
--- Log opened Fri Apr 22 00:00:36 2011
serialhex_evening all... how is everyone (whos awake :D )02:34
-!- serialhex_ is now known as serialhex02:37
serialhexhmm... i guess everyone is afk or asleep02:41
* serialhex thinks it sucks that when he's awake everyone else is asleep, or vice versa02:41
-!- sploving [~sploving@] has left #shogun []02:56
-!- dvevre [b49531e3@gateway/web/freenode/ip.] has quit [Quit: Page closed]03:00
-!- ameerkat [] has joined #shogun08:23
-!- confused [8092a059@gateway/web/freenode/ip.] has joined #shogun08:40
confusedneed some help with linearstring kernel .. can anyone help?08:42
-!- confused [8092a059@gateway/web/freenode/ip.] has left #shogun []08:45
-!- akhil_ [75d35896@gateway/web/freenode/ip.] has joined #shogun09:13
-!- sploving [~sploving@] has joined #shogun09:43
-!- ameerkat [] has quit [Ping timeout: 260 seconds]10:08
-!- akhil_ [75d35896@gateway/web/freenode/ip.] has quit [Quit: Page closed]10:11
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has joined #shogun10:54
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has left #shogun []11:51
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has joined #shogun12:42
@sonney2khmmhh just a little more than 8hrs until the deduplication meeting
splovingyeap. In our coutry, we are sleeping.12:49
splovingAre there many students applied for more than one org?12:51
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has quit [Quit: Page closed]13:29
-!- emrecelikten [c18cf902@gateway/web/freenode/ip.] has joined #shogun13:37
josipsonney2k: we still won't know until Monday :/13:53
-!- emrecelikten [c18cf902@gateway/web/freenode/ip.] has quit [Quit: Page closed]14:16
@sonney2ksploving, in our case so far only you told us.14:19
-!- blackburn [~qdrgsm@] has joined #shogun14:21
@sonney2kanyway I hope there won't be any surprises after the meeting14:25
blackburnwhat I missed? which surprises? ;)14:26
josipsurprise = someone got into 10 projects ? :))14:46
josipthat's a very nice surprise .. for the student at least :)14:46
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has joined #shogun14:55
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has quit [Quit: Page closed]15:01
blackburnsonney2k: need some help15:13
-!- dvevre [~in3xes@] has joined #shogun16:38
-!- dvevre [~in3xes@] has quit [Quit: Ooops..!!]16:58
-!- lionelc_ [4c681efd@gateway/web/freenode/ip.] has joined #shogun17:10
-!- sploving_1 [~sploving@] has joined #shogun17:48
-!- sploving [~sploving@] has quit [Ping timeout: 240 seconds]17:50
-!- sploving [~sploving@] has joined #shogun18:18
-!- sploving_1 [~sploving@] has quit [Ping timeout: 252 seconds]18:20
-!- dvevre [b49531e3@gateway/web/freenode/ip.] has joined #shogun18:54
-!- dvevre_ [b49531e3@gateway/web/freenode/ip.] has joined #shogun19:05
-!- dvevre [b49531e3@gateway/web/freenode/ip.] has quit [Ping timeout: 252 seconds]19:07
-!- dvevre_ is now known as dvevre19:07
@sonney2kblackburn, yes?20:06
blackburnsonney2k: :) I think it is not necessary now20:06
josipfuck, wrong window.20:06
-!- ameerkat [] has joined #shogun20:06
blackburnwe heart you too20:07
blackburnsonney2k: hm.. but may be you can help me to grasp one thing in ROC20:07
@sonney2kblackburn, what was the problem (though I am loving it when questions resolve themselves)20:08
blackburnsonney2k: the problem was my not_understanding how to return matrix in modular, but it is solved now20:08
@sonney2kblackburn, great!20:08
@sonney2kso how can I help about ROC?20:09
blackburnjust looked to typemaps and now understand20:09
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has joined #shogun20:09
blackburnin that algo (do you recall it?) there is if f_prev!=f of i-th example20:09
blackburnso if there will be equal answers ROC will be 'shorter'20:09
josipblackburn: it has as many points as there are unique thresholds + 120:10
josipin my understanding20:10
blackburnjosip: yes but why it should?20:11
blackburnand the problem for me is how to determine it earlier20:11
josipcount the number of unique thresholds20:11
@sonney2kerr, it should have the same number of points as you have predictions20:12
blackburnsome 'overhead' for counting it20:12
josipsonney2k: then you'll copies of the same point20:12
josipassume all of the points have the same likelihood. Then there is only one fp rate and only one tp rate20:13
@sonney2kjosip, if you used the underestimate/overestimated ROC then yes but for expect ROC curve it is ok20:13
* sonney2k is looking for the roc paper link20:13
blackburnsonney2k: so may be I should remove that check?20:13
josipblackburn: it depends on how you document it, the graph will look the asme and the integral will be the same20:14
josipeither way20:14
blackburnjosip: I see, thank you20:15
josipbut I'm not the ultimate authority, so I might be mistaken20:15
@sonney2kjosip, page 14 on
josipyes, so the algorithm cramps all of them in one point20:17
josiplemme recheck it20:17
@sonney2kjosip, when you have multiple points with the same output then you can get into this problem - so you either have a jump to the right or use the expected curve20:17
@sonney2kthen it is not just one point20:17
@sonney2kbut a diagonal20:17
blackburnhehe how silly I am.. it is described in fawcett's paper20:18
@sonney2kblackburn, yes he is the roc guru (gave tutorials etc on this)20:18
blackburnsonney2k: so that check is necessary, right?20:18
@sonney2kblackburn, he also has this nice algorithm (algorithm 2 in that paper)20:19
josipbut assume that you plot it by varying the threshold and not from the sample which you will do in "production". Then you will classify all points with the same threshold either +1 or -120:19
@sonney2knot necessary no20:19
blackburnI'm so ashamed with so late acknowledgement with ROC :)20:19
josipblackburn: they're mostly used in medical stuff, because the classes are very disproportionate (i.e. 0.1% chance of having breast cancer)20:19
@sonney2kjosip, yes when you apply in practise you give a point either +1 or -120:19
josipblackburn: and a classifier that just classifiers everyone as not sick, it will have a 99.9% accuracy on an iid sample from the distribution20:20
blackburnjosip: I see20:21
@sonney2kI can only recommend to always use ROC curves when comparing methods20:21
blackburnthank you, it became clearer to me20:21
@sonney2kaccuracy etc is a useless measure (if that is not exactly what counts latero on)20:22
josipsonney2k: but most people do use only accuracy20:22
@sonney2knote that ROC curves are still not optimal when you have *really* unbalanced data sets20:22
@sonney2kthen you shoudl use precision recall curves20:23
blackburnPRC is my next work :)20:23
@sonney2kjosip, that is bad.20:23
@sonney2kjosip, the user should decide which fp-rate he can afford or how many tp's he needs.20:24
josipI agree20:25
josipa single number (+ a CI maybe ..) is just not enough, unless classes are evenly distributed20:25
* serialhex wishes he could matrix-style download some ML textbooks into his head :P20:26
josipI got confused for a moment :) You mean The Matrix-style :D20:27
serialhexyeah, that too20:27
@sonney2kblackburn, reading fawcetts description about the algorithm - he indeed suggests to drop points with same output and to draw a connecting line instead20:28
@sonney2kblackburn, so indeed output can be shorter20:28
* serialhex will bbl, to learn some more...20:29
blackburnsonney2k: will count different outputs for that20:29
@sonney2kblackburn, however I would just allocate an array #outputs20:29
@sonney2kdoesn't matter20:29
@sonney2kI mean you can allocate a slightly bigger array and then just return the shortened size20:29
blackburnhmm.. I can, but is it a good practice..?20:30
@sonney2kit rarely happens that you have 1e6 outputs that are all the same and so you only have (0,0) and (1,1) in the end20:30
@sonney2kI don't see a problem... it is wasting memory but that's all (no leaks no other side effects)20:31
blackburnokay, just didn't know how you consider it ;)20:31
blackburnthank you, sonney2k, josip20:32
blackburnhave to go now, will test that ROCEvaluation a bit later and will make a pull request20:33
blackburnsome wasting sounds better for me too, because checking will be another one O(n)20:34
@sonney2kjosip, what I don't know though is how one treats duplicate outputs in PR-Curves20:37
@sonney2kI only know that each point in ROC space can be translated to a point in PR space20:37
josipwell the graph will look the same and also the integral20:38
josipwhich is what you care about, right?20:39
@sonney2kjosip, not true unfortunately20:40
@sonney2kif you don't skip points with same output you run into problems20:40
josipthat's true, I thought you meant something different20:41
josipI gtg, ttyl20:42
-!- sonney2k is now known as shogun|sonney2k20:42
* shogun|sonney2k prepares for the dedup meeting20:43
@shogun|sonney2kbye josip20:44
@shogun|sonney2kit is a crazy amount of messages per minute in #gsoc20:46
-!- warpy [] has joined #shogun20:59
warpyanyone here developed a typemap before ?20:59
@shogun|sonney2k627 people in #gsoc - just crazy21:03
warpyoh hey there, didnt see you21:04
warpylong nick name21:04
@shogun|sonney2kgreat isn't it21:05
@shogun|sonney2krequired in the dedup thing going on in #gsoc21:05
warpyshogun|sonney2k, someone i work with is trying to figure out how to do a c# port21:05
warpyi told him i'd ask a few questions, many things are still very much a blur to us21:05
@shogun|sonney2kask ... but don't expect me to answer immediately - I have to check if we get any dupes due to the reshuffling going on in #gsoc21:07
warpywould you mind explaining the core difference between the static and dynamic interfaces21:07
@shogun|sonney2kwarpy, one is swig based and each C++ object will become an object in $LANGUAGE21:08
@shogun|sonney2kstatic is just some global variables with a few objects assigned.21:08
warpya c++ object is (a function, a class?)?21:09
@shogun|sonney2k+ associated functions21:09
warpyand $LANGUAGE is ?21:10
@shogun|sonney2kwhat you want C# or python or...21:10
warpyso basically every c++ object will become another object in the designated language21:10
warpydoes it mean that we have to create the code of the $language's object that deals with the c++ object?21:12
warpyfor each and every one ?21:12
@shogun|sonney2kswig does that for you21:12
warpycan you explain a little on how, i am guessing it doesnt convert c++ to c#21:13
@shogun|sonney2kwarpy, no I cannot - check the swig homepage...21:15
-!- shogun|sonney2k is now known as sonney2k21:16
@sonney2kdedup meeting is over21:17
warpyregarding static interfaces. how do they work in concept. so you have global variables and a few classes. do you need a wrapper class to interface with a c++ class ?21:17
warpymaybe we should take some class as an example21:18
@sonney2kwarpy, we have set/get* interface functions for all kinds of things (scalars, vectors, matrices,...)21:18
@sonney2kand each $lang has to implement these21:18
warpyokay so both interface types static and dynamic need to create new classes to deal with all the input types ?21:19
@sonney2kglobal variables are then for e.g. classifiers - so one can only have one classifier21:19
@sonney2kin static interfaces you need these set/get functions (that we have defined in our SGInterface)21:20
@sonney2kin the modular ones you write typemaps21:20
warpyso in the static one you would write your set get functions in the designated language, this is actually a wrapper for the c++ functions, correct ?21:21
@sonney2kwarpy, no you write them in C++21:22
@sonney2kand utilize you native interface (e.g. in java JNI, in python c extensions etc)21:22
-!- akhil__ [75d35896@gateway/web/freenode/ip.] has quit [Quit: Page closed]21:23
warpyso the static interface is always written in c++ but uses libraries from the designated language  ?21:24
@sonney2kwarpy, yes - but the typemaps are also C or C++ and utilize the libraries21:24
@sonney2kso in that respect it is similar though modular is more powerful as you really have access to all C++ objects in your $lang21:25
warpyokay so the main difference between static and dynamic is that static is limited to functions and classes that you yourself predefined and dynamic by using swig has access to everything21:26
@sonney2kwarpy, yes21:26
warpyawesome, regarding the type map itself. the whole development process.21:28
warpybasically what i would need is to load the project in a c++ ide and first be able to compile it21:29
warpyso far so good ?21:29
warpyis it possible to use vc++, or any windows based ide ?21:31
@sonney2kI've only ever compiled it onder posix platforms  + cygwin21:31
@sonney2kwarpy, I suspect that will not work out of the box21:31
@sonney2kwe use posix threads and signals for example21:32
warpysorry, posix ?21:32
warpycygwin i have used before21:32
warpyokay so that the only way you have for windows so far21:33
warpyso lets say it compiles under windows using cygwin etc21:33
warpyi believe you have a tutorial on your site for windows compiling, correct ?21:34
@sonney2kwarpy, just get all the dependencies... and compile21:35
warpyif you remember, you prepared shogun to support c#. how does a typemap point to that. is it somekind of directive inside the code ?21:37
@sonney2kwarpy, you need to write the file csharp/swig_typemaps.i21:41
@sonney2kwarpy, sploving is the expert to ask here - he actually contributed to swig21:41
blackburnsonney2k: have you had some activity with what dedup?21:45
@sonney2kblackburn, no dupes...21:45
blackburnsonney2k: forgot one awful thing.. I have to know size of matrix because it is 2d21:49
@sonney2kblackburn, argh... you are right21:50
blackburnseems I have to count num of different labels21:50
warpyokay so lets imagine we wrote the typemap and recompiled shogun and everything works21:53
warpynow for the interesting part21:53
warpyhow do i use it with in c# (vc# for example)21:54
warpyis it a matter of doing "using shogun;"21:55
warpysonney2k, still there?22:17
@sonney2kwarpy, yes - no idea. never used csharp berfore22:17
blackburnsuch irritated with not having much time for finish issues with evaluation!22:18
warpywhat it would be like in other languages ?22:19
blackburnsonney2k: how do you use valgrind when running tests?22:31
blackburnI mean when I ran it, there are many 'pythonic' errors, and I don't see any useful information22:31
@sonney2kwarpy, you really should look at the examples
@sonney2kwarpy, it is intuitive to say the least22:32
@sonney2kblackburn, you either use a reasonable suppressions file or you dump everything into a file and grep for shogun22:32
@sonney2k(later on)22:32
blackburnthank you22:32
blackburn*getting mad cause some memory allocation error*22:33
blackburnI'm such a fool :D22:34
warpyokay looking thanks22:35
blackburnsonney2k: we both failed! I'm _not_ have to know the size of matrix if I will return first N columns..22:42
@sonney2kblackburn, ahh you are right it is column first so UI#H@OI#UH@!!!!22:43
* sonney2k 's brain is off22:43
blackburn:D was?22:43
blackburnsonney2k: you better kick me off cause I'm stupid :D22:44
blackburnsonney2k: yesterday (for you today) was the day shogun's github received no commits22:47
@sonney2kblackburn, hurray :)22:48
blackburnsonney2k: is it? ;) do you like when we arent commiting?22:51
blackburnoh, I finally find the error22:54
@sonney2kblackburn, well we had a screw up commit on that day and I had to force push ... so better no push/pull than such ones!22:55
blackburnsonney2k: what is the commit?22:56
@sonney2kyou won't see it :)22:56
blackburnsonney2k: what it screwed?22:56
@sonney2kand I am too ashamed to disclose what we did :)22:56
blackburnI already had a proper ROC, but now it returned:22:57
blackburnsonney2k: seems to be a strange ROC :D22:57
@sonney2kblackburn, modern art22:59
@sonney2kblackburn, at least it is between 0 and 122:59
blackburnhehe, I screwed something when doing some 'refactoring'23:00
serialhexblackburn: wtf is that image _supposed_ to be??? :P23:07
blackburnserialhex: ROC curve :)23:07
blackburnserialhex: may be like
serialhex is this what your talking about???23:08
serialhexahh, ok cool23:08
* serialhex goes and reads about roc's, and not this kind:
@sonney2kserialhex, cool bird :D23:11
serialhexyes i know23:11
blackburnsomeone is stealing my ROC23:11
* serialhex has played too much D&D23:11
blackburnserialhex: is it you?23:12
serialhexi'm not stealing it :D23:12
blackburnserialhex: I'm watching!23:12
serialhexi'm sure you are23:12
blackburnit draws some fantastic things :D23:13
serialhexreally? such as???23:14
* serialhex waits to see some fantastic drawings23:14
blackburnthe only thing i'm pretty sure: it isn't ROC, it is some memory 'slice' :D23:15
serialhexyeah... you can start a new form of art with it: abstract graph art :D23:15
* serialhex loves wikipedia23:29
* sonney2k yawns23:39
@sonney2kblackburn, btw wikipedia lists the other measures we have on the ROC page too
dvevresonney2k: are the irc logs online somewhere?23:44
@sonney2kdvevre, not yet23:44
@sonney2kdidn't find the time...23:44
dvevrehmm.. okay..23:45
serialhexdvevre: if i can figure out how to extract them from quassel i have a pretty complete log of the chat23:45
* serialhex has logs since ~4 days after the channel opened23:45
dvevreserialhex: you never disconnect? nice!23:45
serialhexdvevre: yeah i almost never turn my computer off23:45
dvevreserialhex: i'm mostly stuck with the webchat version here.. and that too on a lab computer23:47
serialhex:-/ that kind of sucks dvevre!23:47
dvevrehence have to miss out on much of the conversation..23:47
blackburnsonney2k: eh.. they all are already implemented23:53
blackburnexcept MCC23:53
@sonney2kblackburn, I think that one too - we just named it cross_correlation_coefficient or so23:54
@sonney2kI just meant to say - nice table :)23:54
blackburnsee now23:54
-!- ameerkat [] has quit [Ping timeout: 252 seconds]23:54
--- Log closed Sat Apr 23 00:00:36 2011