-!- parijat [75dc2d05@gateway/web/freenode/ip.] has joined #shogun08:17
parijatiglesiasg: hi!08:17
parijatiglesiasg: I just saw your comments. About using real_max08:18
parijatiglesiasg: actually CHAID handles MISSING value as just another attribute value08:19
parijatiglesiasg: so implementation-wise using MISSING as a valid float value gives me a lot of advantage08:20
@iglesiasgparijat, ok08:21
parijatiglesiasg: for example if you see merge method for ordinal attributes, there is nothing special done for missing attribute08:21
@iglesiasgparijat, I trust your criteria :)08:21
parijatiglesiasg: :)08:22
@iglesiasgparijat, maybe I missed it, but did you mention in the doc that the user should use real_max to denote missing values?08:22
parijatiglesiasg: nope not at the brief part! I should do it probably. I have mentioned it in the declaration place08:23
parijatiglesiasg: This CHAID PR really got big without me realizing beforehand!08:24
@iglesiasgall right08:26
@iglesiasggtg now, see you later guys!08:26
parijatiglesiasg: bbye!08:27
@iglesiasgparijat, let me know in the PR if you plan to update something or I should just merge08:27
lambdayHeikoS: hey! do you have a min?09:17
@HeikoSlambday:  hello!09:19
@HeikoSlambday: how are things going?09:19
lambdayHeikoS: hey09:19
lambdayHeikoS: good actually09:19
@HeikoSnice, good to hear!09:19
lambdayHeikoS: so I made a quick python implementation to check results against09:19
lambdayand things are sane09:19
lambdayHeikoS: with this new design, adding b-test would just take 15 mins09:20
@HeikoSlambday: cool! very nice to hear that09:20
@HeikoSlambday: yeah quick python implementation is they way to test against09:21
lambdayHeikoS: yeah mentioned the link in commit msg09:21
@HeikoSlambday: and then put this into a notebook gist, send us for validation, and then put the link into unit test or so09:21
lambdayHeikoS: alright..09:22
@HeikoSlambday: looking very forward to play with this stuff btw!09:22
lambdayHeikoS: I'm just checking how many integration tests I broke btw :D09:22
lambdayHeikoS: yeah! here is the python code btw -
lambdayHeikoS: you and Dino please check this once you get some free time... if this one is at least okay09:23
lambdayHeikoS: then its easier to verify shogun code09:23
lambdaymade some changes later for generating unit-tests values :/09:25
@HeikoSlambday: yeah the unit tests are important. integration tests not so much for this one09:25
@HeikoSlambday: but if you break them, make surre to look at the errors before re-generatiung09:26
@HeikoSlambday: you know there is linspace in numpy? :)09:26
lambdayHeikoS: oh they have it already!09:26
lambdayargh I wrote creepy lambda stuff09:27
@HeikoSlambday: code looks fine btw, gotta check the things in detail but good on first look09:27
lambdayHeikoS: okay.. let me know if its okay later when you get some free time09:28
lambdayHeikoS: meanwhile I'm fixing broken ingration tests and then will send PR09:28
@HeikoSlambday: I am free now, what else is up?09:28
lambdayHeikoS: alright so I pushed the code to my fork already09:28
lambdaythe unit-tests agrees with the results from the python code09:30
lambdayso, for other dependent stuffs, I used local machine values for unit-tests09:30
lambdayspecially kernel selection unit-tests09:30
@HeikoSlambday: okay09:34
@HeikoSlambday: yeah I mean the unit tests so far were written this way09:34
@HeikoSlambday: I just did stuff on my machine locally (from my MSc thesis code) and then put the numbers is, which is pretty bad09:35
@HeikoSso it cant be worse ;)09:35
lambdayHeikoS: haha :D having a reference implementation is best.. I don't have any for kernel selection stuffs..09:36
lambdayHeikoS: but I guess as long as basic statistic/var computation is verified with external code, other dependent stuffs follow from it - so it sort of works as before09:37
@HeikoSlambday: yeah09:52
@HeikoSlambday: so the kernel selection stuff worked fine09:52
@HeikoSlambday: its very easy after all09:52
@HeikoSlambday: only the kernel combinations might be changed since they need all of the h-terms and their covariance09:53
@HeikoSlambday: btw feel free to you your new awespome linalg backend for all spectral things :)09:53
@HeikoSthoralf: hey!09:53
lambdayHeikoS: yeah - well, the covariance stuff is changed a bit in compute_statistic_and_Q .. basically what I did is to stream 2 blocks from both the distributions and computed online cov... no h-vectors are there anymore since statistic directly computes a scalar value blockwise in the new formula09:56
lambdayHeikoS: with the new implementation, the weights for in the kernel selection are *pretty* close to what it was before09:57
lambdaylinalg - yeah! as soon as it makes it to develop :)10:01
@HeikoSlambday: cool, that is good, they cannot be the same, but close is fine10:08
@HeikoSlambday: more important that the statistic/variance is correct and the kernel selection procedure should do the rest, btw feel free to review this code a bit and point out weaknesses10:09
@HeikoSlambday: did besser82 or wiking ever got back to you on the merge of linalg?10:09
lambdayHeikoS: umm.. bo10:09
lambdayHeikoS: btw khalednasr had some great ideas regarding using a wrapper class for viennacl matrices10:10
lambdayHeikoS: that was one big issue with the existing thing (remember why you asked to add at least a few more methods) :D10:11
lambdaybut that's solvable using his ideas10:11
@HeikoSlambday: yeah I saw his suggestions. +1 from my side12:18
@HeikoSlambday: I dont know where everyone is, but we should aim to merge this stuff soon.12:19
@HeikoSlambday: what about some cholesky and triangular solvers?12:19
lambdayHeikoS: well, before that I think this viennacl matrix wrapper stuff should be finished12:20
lambdayso we'd get a clear idea whether we can use their matrices as well - since they too have many solvers12:20
lambdayif khaled's idea turns out to be great, then rest can be added more easily12:21
@HeikoSlambday: I see, cool!12:21
lambdayHeikoS: man I've been experimenting with p-values with old and new...12:21
lambdayHeikoS: is it normal for linear time mmd to give higher type 1 errors?12:22
lambdayeven in the previous implementation its pretty high12:22
@HeikoSlambday: type1?12:45
@HeikoSlambday: that depends on the quality of the asymptotic approximations to the null12:45
@HeikoShow many samples?12:45
@HeikoSyou can always check that via sampling the null12:45
@HeikoSand look at the histogram12:46
@HeikoSif it looks very Gaussian, the type1 error should be exactly as desired12:46
lambdayHeikoS: I was working with existing examples only... tried 1000,10000,100000 samples with 0.5 mean shift12:46
lambdayyeah plot is better.. trying that12:47
@HeikoSlambday: if the plot is good but the type1 is wrong, then there is a bug12:52
lambdayHeikoS: what ideally should be the correct type1 error?12:53
lambdayHeikoS: plot is pretty gaussian12:53
lambdaywith mean around 012:53
@HeikoStype1 error is asymptotically euqal to p-value13:11
lambdayHeikoS: I guess then it makes sense - I also tried changing the blocksize and as Dino mentioned in the writeup, this approach gives correct variance estimate when blocksize is small but overestimates when blocksize is large13:21
lambdaymade some plots13:21
@HeikoSlambday: put them into a notebook, we can re-use this later13:21
lambdayHeikoS: alright13:22
lambdayHeikoS: data is going to change for integration tests - I'm first checking whether the changes make sense... then sending pr13:23
@HeikoSlambday: yeah, this is fine, just monitor whether things are like totally different13:24
@HeikoSbut on the other hand if everything is probably unit tested then we should be fine13:24
lambdayHeikoS: yeah its all unit-tested...13:24
