good morning sonney2k: there you go
wiking, please update your PR with the trivial changes such that I can merge it!
hi all
hey
n4nd0: how did you holidays go? blackburn: they were good!
I have been out in the wild camping and so on, it has been fun :)
cool
blackburn: I read about it in the mail wiking sent a couple of days in the list
it sounds very good, I would like to be in, yes
n4nd0: we both are just curious how to handle SUCH BIG data
138 Gb!
and actually now we need to come up with some features
wow, 138 GB
I am going to come up with some fast HOG and wiking wanted to adapt FREAK
http://infoscience.epfl.ch/record/175537/files/2069.pdf I am looking forward to try FREAK actually
it is something faster and more accurate than SIFT/SURF
blackburn: hi!
pluskid: how is it going? you became rare here ;)
blackburn: I hang here, but rarely speak :D
struggling with some 300+ line matlab code...
matlab function
exactly like I do
to turn it into C++
haha
HA
blackburn: what algor are you working on now?
pluskid: https://github.com/shogun-toolbox/shogun/blob/master/src/shogun/lib/slep/slep_solver.cpp that thing is just 8 matlab files (loc ~200) generalized into one solver
pluskid: well I am currently fixing bugs in recently introduced multitask algorithms ported from the MALSAR library
they are basically about the same
regularizers are different :)
hrhr
blackburn: not better here :P, https://github.com/pluskid/shogun/blob/multiclass/src/shogun/multiclass/tree/RelaxedTree.cpp
two big functions
pluskid: btw we now can use eigen3, did you know?
blackburn: don't know :-/
pluskid: here is the example
https://github.com/shogun-toolbox/shogun/blob/master/src/shogun/lib/malsar/malsar_clustered.cpp
there are some double* but they are just for libqp interfacing
blackburn: eigen3 is a matrix library? yes
cool!
vectorizing actually
so we can expect even some performance impact here
longing for a matrix library in shogun :D
and I guess this would be much easier to code than raw LAPACK
A*B is A*B
not cblas_dgemm(CblasColMajor,CblasNoTranspose, ... N,M,K
:D
then what about SGMatrix?
will it stay?
well just store everything in sgmatrices
but in solvers/algorithms
you may use eigen3
I don't like idea of having eigen3 matrices members
I see
anyway, such a good news!
actually if you want to treat SGMatrix as Eigen3 matrix
you may do that using Map
so it looks like a good idea for me to keep eigen3 only inside
hmm, that's reasonable
hi
heiko: hey
blackburn heyho
heiko: I am thinking how can I make multitask work with xval or at least subsets
heiko: can we know original indices of subset? nice
well
original indices?
or initial or any other word
:)
I mean
You could find that out, however, its expensive
why do you need that?
in multitask you have something liek
0:20 is first task
20:40 is second task
and if you apply to first 20 vectors
you have to use first learned w
or learned model generally
cant you do that via the store_model_features?
let me try to think (it's hard for me) :D
heiko: what can I store?
mmh
I must admit that I dont really get the problem you have
for doing xval, you would just need to train/test on different indices
these come from the SplittingStrategy
Now if you want only certain combinations, just build a new splitting strategy
no I need to tell machine 'it is the second task vector, use second model'
which respects you taks etc
you should do that via the indices
but the cross-validation class just takes the indices and performs train/test
yeah
damn I need to rearrange vectors somehow
or to construct an explicit mapping
I think the splitting strategy would be good for that
because currently it doesn't work with non contiguous tasks
no idea...
heiko: I mean now I am using a vector [0,20,40]
to indicate 0:20 is the first task
if I get something random
like [second task vector, first task vector, second task vector, ... ]
it won't work at all
and I don't really like it :)
but if you know which indices are which tasks
then you could build training/test indices that reflect that
right? yeah
but still time to change things I think
indices vector would be better here..
might be, yes
blackburn, what do you think about the first section of the tutorial?
let me check
hmm it doesnt' complile here :D
yep
heiko: okay so what do you mean first section?
for statistical tests
really?
what error do you get?
probably package missing
I just installed
all is ok now
I meant more like style
I more focussed on the algo basics rather than SHOGUN internals
looks cool
exactly
and just saying see shogun class bla
but we need a way of doing this
I mean a uniform way
I currently only added TODO's
well style is something subjective
but I like the style you use and it probably fits to mine
I'll add some stuff in next few days too
kk
How to reference shogun classes/methods?
just mention name?
yeah I think so
and then user can look up class reference
ok
lets put the class names in some courier font
well actually may be we can set up some reference
yeah I thought so
yeah \textt
but by hand=lots of work
tt
no why by hand
http://www.shogun-toolbox.org/doc/en/current/classshogun_1_1CBinaryFile.html
could that be embeddedn in latex? \defcommand[1]{\shogunclass}{\href{http://www.shogun-toolbox.org/doc/en/current/classshogun_1_1%1.html}{%1}}
try that
k will do
that might be an idea
ehm
\newcommand
now defcommand :D
^_^
or just put in url to class reference
on dis
disc
or so
dont know
no, why
just use that kind of command
\shogunclass{CBinaryFile} should work
let me try
kk
a few mistakes here :)
oh the weather is sweet finally
whether?
:)
same here
first sunny day in ages
has been raining all the time
well opposite
was way too sunny :D
heiko: \shogunclass{anything}
heiko: I commited it
thats supercool
thanks
is it also in \tt ? :)
yes
heiko: it points to shogun-toolbox doc
so if you click you see doc of the class *magic*
:)
hrhr
heiko: I wonder whether anyone would be able to parse my english in that tutorial :D
lol :)
well we should correct each other
and its a good chance to learn for you :)
I remember I had to fix a lot of wrong usages in my wannabe-jmlr-paper
heiko: I had one idea on authorship
which is? :)
something like \authors{Heiko Strathmann} in section/part
blackburn, I like the commadn you added
which is shown in footer
ok, good idea
so user can kick ass of correct person
:D
heiko: about style or rather desing
design
I remember there was a cool package to set up section headers
blackburn, theres the git blamelist for that :)
but I agree anyway
and I like fancy chapter headers
like with capital letters
heiko: I actually like fonts
no
I love fonts
:D
heiko: http://www.tug.dk/FontCatalogue/mathfonts.html which do you like most?
lol :)
Well ,I think we should stay with the standard since people are used to it.
But I love facy capital letters at section beginnings
how can people get used to font? ;)
because everybody uses it
I find palatino much more readable than computer modern roman
so it sorts in smoothly
heiko: just try it
\usepackage[sc]{mathpazo}
dont know, lets just play around with that
see what people think :)
heiko, he
can always change, nice thing about LaTeX
heiko: I just ran some script I used before but now with shogun compiled with optimizations
never thought it depends SO MUCH
10 times faster at least
what? blackburn, what did you do?
ah alright
optimisations
yeah
yeah these count massively
take for example loops
I never thought so much
when you dont optimize, i++ takes three operations
++i just one
heiko: do you know whether gcc uses SSE here in loops?
when you optimise, its always one
no idea
heiko: if I had \infty time I'd analyze asm code
:D
blackburn, I did that in uni intensively, completely boring, believe me, you dont wanna do that
heiko: why did you?
I am a computer-scientist
had to do it
:)
I already changed lots of these courses to math, but couldnt avoid all :D
well I am too but I hadn't ;)
lucky you!
gotta have a coffee now, see you later!
see you
http://cat.shogun-toolbox.org.meowbify.com/
shogun: Sergey Lisitsyn master * r6eea837 / (7 files in 4 dirs): Introduced random search model selection - http://git.io/SMuH5Q
blackburn, random search? yeah :D
what should that give? :)
no idea
Actually, for high dimensions, this is some kind of genetical programming approach, where you randomly find a good estimate and then refine from there
so useful  in conjunction with other methods :)
heiko: well let it be anyway
blackburn, sure gcc uses sse123, mmx etc
I see
blackburn, there is close to no gain to do assembly tweaks
heiko, ^
sonney2k: no I meant different thing
blackburn, for example I used a for loop to compute a dot product
it is useful to know what compiler produces
vs. one that uses sse
sonney2k, yeah, also its boring and can be done automatically by software which smart people wrote :)
about the same speed
sonney2k, what do you think about adding a test framework
based on unit tests like googletest?
but when you overoptimize it runs slower on other tasks
heiko, I am not so convinced... won't enabling the tests we have be sufficient? I mean what is the gain?
I mean a focus on small scale tests
I write a method
I add a test for that method
like index conversion or so
store_model_features
with defined input output
the test framework is for larger scale tests
heiko, so sth like SVM cannot be tested then
I am more interested on the engineering side , not so much on ML
ok
memory errors
guaranteed input/output
this stuff
I understand then - however I can tell that these 'way too big tests' would cover these smaller ones somehow
because most errors that pop up are exactly caused by this: somebody forgot a case ..
at least the buildbot would tell us what failed
Thats true, most errors are found
however, it always takes ages to debug
And so much code is not even tested at all
well it would point us to the commit that broke *a* test
In practice, errors are more complicated
for example this mkl stuff recently
but yes it would take more time to fix this than when you have such small tests
It took quite a while till I found the method that was wrong: strore_model_features did a wrong job
a test would have detected that
I think so
heiko, the problem here really is that we changed so much recently that basically all the 'big' tests were no longer working
(think of serialization framework here!)
yeah I know
I think its a lot about actually running the code with defined input/output at least once
so people would think more about stuff
rather than quickly committing as soon as the first test case worked
heiko, I can only tell that even blackburn doesn't always add an example to the code he added. And it is even more difficult with anyone submitting patches...
I know, thats much more work
people just give up
because writing tests is not a fun taks
I know
but its crucial to *any* larger software
you are right - that is why I require an example before I merge anything
it is useless currently because tests are no longer enabled
(as in not running)
Another reason why I suggest it is that we currently dont really have a place for these kind of tests
I am always commiting them to examples
but actually many are not examples, way to complicated/technical
for small tests we don't yes
but rather just checks that everything works as I want it to
in python modular I think examples are nice to show how things work and are great tests too
yes they are great for detecting errors
but its always hard to debug since you only know that this large example fails
I think it would be worth so much if you knew that a method failed
another thing: sometimes code is committed too fast. When you would have to write a test, you would think more about what you actually did there
more of a incremental development
heiko, I don't disagree - I just don't know who would write those tests
When somebody writes code, it has to be tested anyway, people do that locally
you can't program without tests
so why not adding that bit of extra work to nail it down
I suggest a rule: each new method has to get a test
just all code covered at least once
its not that much work actually
and since they are all small scale, its easy to write them
since you already wrote the method
heiko, yeah I use sth that I turn into an example examples (in python modular)
yeah like that, but only small scale
imagine every method had one of these
bug tracing would be so easy
well any method you modify and any method that gets added
and methods would be more reliable
yeah
so over the time, we collect more and more
and also, one could make it madatory that all tests (also old ones) pass and have no mem-leaks
=less new bugs
(at least in my fantasy its like this :)
*g* 8)
well, its just a suggestion
Now with all the new people, might be easier than a year ago
I wouldn't mind - lets ask the others: blackburn, n4nd0, wiking - opinions on  new rule: "for any method you modify and any method that gets added a test is required"
I would add: "with defined input/output assertions"
how? blackburn, leave aside the how for now.
I share heiko's opinion, I think it is a good idea that will decrease the number of bugs introduced
I don't mind
ok then - so heiko any ideas how to do that practically then?
what about a new folder new to 20:09 wiking, yes 20:09 heiko: i've started off with the following structure 20:09 add to each class a _unittest.cc implementation 20:09 that has the unittests 20:09 each class one .cpp file with tests for each method? 20:10 and then just modified the Makefile.template that it complies it with different flags (i.e. addig the flags for gtest/gmock 20:10 I would rather go for each shogun .cpp file gets one *_unittest.cpp 20:10 instead of each class 20:10 nice 20:10 heiko: well that's the idea 20:10 and then you have a main_unittest.cc 20:11 but the tests should be in a different dir 20:11 that calls all the tests 20:11 heiko: i've tried that one as well 20:11 like 20:11 src/testing 20:11 like a copy of the src dir 20:11 one could symlink all the .cpp files 20:11 but it's very bizzare to have the same dir structure 20:11 just for the unittesting 20:11 imho 20:11 really? well, ok 20:11 so same dir 20:11 this way you have to create right next to your implementation 20:12 your _unittest.cpp implementation as well 20:12 but then again 20:12 year, but it becomes a bit messy then 20:12 many files 20:12 this can change 20:12 i'm ok with anything 20:12 i've just started of with SGVector 20:12 to define some unittest 20:12 yeah, dont know, blackburn, sonney2k, ^ 20:12 yeah good place to start 20:12 like multiply, fill with zeros etc 20:12 yes 20:12 exactly 20:12 dot product 20:13 etc. 20:13 we should somehow add a memory test 20:13 heiko: that's tricky 20:13 that it *always* fails if there is a memory leak or something 20:13 heiko: unittest is not meant for that 20:13 I know 20:13 heiko: but there are various solutions 20:13 we could add a script to run all these with valgrind 20:13 like blackburn did for the examples 20:13 heiko: well in some way that's already there in examples/libshogun 20:13 the thing is: it should be automatically enabled 20:13 heiko: the problem is that gtest for example 20:14 otherwise, one forgets 20:14 does some tricky things 20:14 and thus valgrind will sometimes start crying 20:14 so valgrind complains about that? 20:14 for no reason 20:14 yes 20:14 oh 20:14 thats bad 20:14 or at least this is what i've read about 20:14 at some forums 20:14 but there are other ways 20:14 the problem is that those ones are system specific 20:14 I think we should definately check the tests for memory in some way 20:14 i.e. depends on the OS 20:14 mh, other solutions? 20:15 so for win there's a straightforward 20:15 http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.80).aspx 20:15 but that's really for M$20:15 aah way 20:16 wait! 20:16 i remember now 20:16 so yeah i forgot 20:16 boost.testing has an option 20:16 for memory leak detectio 20:16 http://www.boost.org/doc/libs/1_44_0/libs/test/doc/html/execution-monitor/user-guide.html 20:17 buut that's actually m$ only as well 20:17 so i'll push soon my 'gtest based' attempt for unittest into my repo 20:17 so you can comment on it 20:17 wiking, ok great work, we can still think of how to detect memory issues 20:18 heiko: yeah i've googled some 20:18 but w/o luck 20:18 i think valgrind is still the best way to go in this case 20:18 ok, well lets see how it behaves with gtest 20:18 i mean osx has it's on way as well 20:18 but that's as well osx 20:18 and i dont think we want to handle all the OSes 1-by-1 20:19 the funny thing about gtest is actually 20:19 that you can even define that what should be the execution time of a given test 20:19 so if the test doesn't run within that time it'll still report it as fail 20:19 so it has some good stuff in there .... 20:20 I agree with the OS's 20:20 maybe thats second step 20:20 basic tests first 20:21 theres probably lot to discuss with the others on your patch 20:21 but yeah i think basic unittesting would be already a good thing to have 20:21 so that really we can point out if there's suddenly something broken 20:21 heiko: yeah i've just felt that something should be done about it and didn't really felt for discussion 20:21 we'll see when there's already something to discuss about 20:22 agreed, nice! :) 20:22 ok i'm aw now a bit to polish the example 20:22 and then i'll send a mail to the list 20:22 so that anybody can add their thoughts 20:22 great work wiking, Ill try to add something tomorrow then 20:26 heiko: okie 20:29 wiking: i'll add multiple output multiclass labels for task 1 20:30 we don't support it now actually 20:30 thanks %deity we have well designed multiclass ;) 20:30 -!- heiko [~heiko@host86-177-183-33.range86-177.btcentralplus.com] has quit [Quit: Leaving.] 21:06 -!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Quit: leaving] 21:45 shogun: Sergey Lisitsyn master * r2b6dcd3 / (14 files in 3 dirs): Changed task indexing way of MALSAR based algorithms and task api in general - http://git.io/5eGMZA 22:03 build #84 of bsd1 - libshogun is complete: Failure [failed compile]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/bsd1%20-%20libshogun/builds/84  blamelist: Sergey Lisitsyn 22:10 blackburn: yeey 22:27 wiking: what? 22:27 on your prev message 22:27 i didn't see it 22:27 shogun: Sergey Lisitsyn master * r7a0f065 / (2 files in 2 dirs): Fixes for feature blocked logistic regression - http://git.io/jtO7tA 22:59 build #85 of bsd1 - libshogun is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/bsd1%20-%20libshogun/builds/85 23:05 sonney2k, around? 23:22 bujjaaa 23:47 this works \o/ 23:47 --- Log closed Sun Jul 22 00:00:17 2012