Open in new window / Try shogun cloud
--- Log opened Wed Jan 29 00:00:08 2014
--- Day changed Wed Jan 29 2014
@wikingthoralf: i hope that the other unit tests (libmrm etc) are convex problems00:00
@wikingas they cannot solve non-convex problems00:00
thoralfAt least they converge. ;)00:00
@wikingwell that shouldn't mean anything00:01
@wikingif u test it on a non-convex example00:01
@wikingthat's up to luck that it does anything meaningful00:01
thoralfIn the linear case it shouldn't be a big deal.00:04
thoralfForget what I said.  It's simply wrong.00:07
thoralfNo excuse. ;)00:07
@wikingokey :D00:08
thoralfLet me know if you can reproduce my observation...00:13
thoralfHave a good night...00:13
-!- thoralf [] has quit [Quit: Konversation terminated!]00:15
-!- FSCV [~FSCV@] has quit [Quit: Leaving]01:11
-!- zxtx [] has quit [Ping timeout: 252 seconds]03:40
-!- Skg_ [82c2a6a3@gateway/web/freenode/ip.] has joined #shogun03:53
Skg_Hi, I was wondering if anybody would be able to help me out with an issue I've encountered. I've just built v3.1.1, and have encountered an issue while running through the modular tutorial ( When I try and call the function Labels, I receive the error "Cannot create new instances of the type 'Labels'" - is there anything that I might have missed that would cause such 03:55
Skg_Ah, as it turns out it's an issue with the tutorial - labels no longer exists, but there are binary/multiclass/regression labels available.03:59
Skg_The tutorial also appears to have an issue with calling PerformanceMeasures - it looks like my shogun.evaluation does not include PerformanceMeasures.04:05
-!- zxtx [] has joined #shogun04:25
-!- Skg_ [82c2a6a3@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]04:48
-!- Skg_ [cba6ea41@gateway/web/freenode/ip.] has joined #shogun07:23
-!- besser82 [quassel@fedora/besser82] has quit [Ping timeout: 276 seconds]07:35
lisitsynargh -30C here I don't wanna get out07:40
-!- Skg_ [cba6ea41@gateway/web/freenode/ip.] has quit [Quit: Page closed]07:52
-!- besser82 [quassel@fedora/besser82] has joined #shogun08:01
-!- lisitsyn [~lisitsyn@] has quit [Ping timeout: 252 seconds]08:37
-!- benibadman [~benibadma@] has joined #shogun08:56
-!- zxtx [] has quit [Ping timeout: 252 seconds]09:00
-!- zxtx [] has joined #shogun09:09
-!- lisitsyn [] has joined #shogun09:21
-!- benibadm_ [~benibadma@] has joined #shogun10:06
-!- benibadman [~benibadma@] has quit [Ping timeout: 272 seconds]10:09
-!- benibadman [~benibadma@] has joined #shogun10:38
-!- benibadm_ [~benibadma@] has quit [Ping timeout: 245 seconds]10:40
-!- benibadm_ [~benibadma@] has joined #shogun11:10
-!- benibadman [~benibadma@] has quit [Ping timeout: 260 seconds]11:13
@wikinglisitsyn: :D11:16
@wikinglisitsyn: get a jacket :)11:16
lisitsynwiking: every cloth is double now :D11:19
@wikinglisitsyn: so change or not to change that is the question *now*11:22
-!- votjakovr [~votjakovr@] has joined #shogun11:40
@wikingwe need a spark interface :)12:23
@wikinglisitsyn: here?13:07
-!- HeikoS1 [] has joined #shogun13:14
@wikingHeikoS1: yo13:24
lisitsynwiking: ya13:50
@wikinglisitsyn: have u checked mesos or spark?13:52
lisitsynwiking: hmm no13:54
HeikoS1wiking: yo!15:37
-!- lisitsyn [] has quit [Quit: Leaving.]16:02
@wikingHeikoS1: ok so the plan for the new backend is actualy this: spark or mesos16:12
@wikingcheck the Interactive Use part16:13
@wikingbasically depending on a env variable16:14
@wikingit'll run your task either on a spark cluster (or mesos) or on 4 local cores :P16:14
HeikoS1wiking: looks and sounds good, I will check this  out16:32
-!- lisitsyn [~lisitsyn@] has joined #shogun17:13
-!- benibadm_ [~benibadma@] has quit []17:43
-!- iglesiasg [] has joined #shogun18:32
-!- mode/#shogun [+o iglesiasg] by ChanServ18:32
-!- shogun-notifier- [] has joined #shogun18:33
shogun-notifier-shogun: Parijat Mazumdar :develop * ccd01c4 / / (12 files):
shogun-notifier-shogun: minor changes in KMeans18:33
shogun-notifier-shogun: Fernando Iglesias :develop * 1bc8681 / / (12 files):
shogun-notifier-shogun: Merge pull request #1831 from mazumdarparijat/fastkmeans18:33
shogun-notifier-shogun: minor changes in KMeans18:33
-!- lambday [67157c4d@gateway/web/freenode/ip.] has joined #shogun18:44
lambdayHeikoS1: hi... there?18:44
HeikoS1lambday: hi!18:44
HeikoS1how are things?18:44
lambdayHeikoS1: hi.. things are good...18:44
lambdayHeikoS1: I was studying about multithread things.. and bumped into things like thread pooling... as lisitsyn was saying...18:45
lambdayHeikoS1: I was working on a prototype..18:45
lambdaywhich maintains a job queue18:45
lambdayand there are a bunch of working threads.. which wait for jobs to be available on that queue18:46
lambdayand as soon as they get jobs... they wake up and start executing18:46
lambdaythis is the idea we initially had18:46
lambdayoh but before that.... your exams are over?18:46
HeikoS1lambday: yeah that was the idea18:46
HeikoS1lambday: exams are over18:47
HeikoS1finally :)18:47
lambdayHeikoS1: how was it?18:47
lambdayhehhe :D18:47
HeikoS1lambday: mixed18:47
HeikoS1lambday: so yeah this could be a nice system of implementing the multithread stuff18:47
HeikoS1maybe something easier would be good to start with as a prototype for the interfaces18:47
lambdayHeikoS1: yeah... and from what I checked out, c++11 has a really cool library for multithread things18:47
lambdayHeikoS1: and IMO its better if we could use that... cause pthread is too system specific...18:48
HeikoS1lambday: I know, that might be useful18:48
HeikoS1lambday: I dont know whether we are allowed to do that18:48
HeikoS1lisitsyn: might have an answer to it :)18:48
-!- HeikoS1 [] has left #shogun []18:48
-!- HeikoS1 [] has joined #shogun18:48
lisitsynallowed to do what?18:48
lambdayHeikoS1: if the computation engine part is drawn out of shogun and put into an external library, then probably it can be...18:48
lambdayand we don't extend SGObject for these classes18:49
lambdaymay be then??18:49
HeikoS1lambday: we will still need an internal wrapper for things18:49
HeikoS1so the first step is to write a very very easy multithread implementation18:49
HeikoS1to finilise the interfaces18:49
HeikoS1and then once that works think about the backend18:49
lambdayHeikoS1: currently what I wrote is an extension of CIndependentComputationEngine... (not fully working yet)18:50
lambdaybut it pretty much does what we earlier discuss about18:50
lambdaycalls the compute on job...18:50
HeikoS1lambday: ok that sounds good18:50
HeikoS1lambday: so, why dont we start with an openmp backend18:50
HeikoS1that seems very simpole18:50
lambdaybut for the "wait_for_all" method, it would have been really nice if we could have used c++11's future/promise things18:50
HeikoS1with very simple queuing18:50
HeikoS1lambday: ah yeah18:50
HeikoS1lambday: maybe its a good idea to change that then18:51
lambdayHeikoS1: but it all depends on whether we can support c++11's std::thread18:51
lisitsynwhy not?18:51
HeikoS1lisitsyn: is shogun fully c++11?18:52
HeikoS1then lets use that18:52
HeikoS1as a first backend18:52
lisitsynHeikoS1: what's fully c++11?18:52
lambdayHeikoS1: we can check for its support18:52
lambdaylike currently it does for a bunch of c++11 features18:52
HeikoS1but still lets keep the interface in a way that things can be done from the modular interfaces18:52
lisitsynafaik it is supported on major compilers18:53
lisitsynso I'd not be afraid of it18:53
lambdayanyone with g++4.7+ should have it18:53
lambdayvc++ also supports std::thread now...18:53
lisitsynyou probably know boost has threadpool18:53
lisitsynI bet it is worth to take some ideas from it18:54
lambdayhaven't used it though18:54
lambdayHeikoS1: lisitsyn: so do you think I should send a PR with the current implementation of my thread pool things... and then may be we can discuss?18:54
lisitsynyeah sure18:54
lambdayI'll finish it soon then18:55
lisitsynjust let me know when to take a look18:55
HeikoS1lambday: yeah maybe thats a good idea18:55
lisitsyncause I am a bit18:55
HeikoS1lambday: a working example (like kernel matrix) would be good also18:55
lambdaylisitsyn: hehe :D18:55
lambdayHeikoS1: yeah I thought of that...18:55
HeikoS1since again interfaces are important, how to specify the parts18:55
HeikoS1lambday, lisitsyn also talk to viking about the spark stuff he mentioned18:55
lisitsyndamn it is fcking cod18:55
lambdayHeikoS1: another thing... the job.... I think we should use a callable object as job instead of making it a class...18:55
lambdayspark stuff?... sorry I am not updated :(18:56
lambdaylisitsyn: -10 degrees? :P18:56
lisitsynlambday: -30C18:56
lambdayfuck!! :-o18:56
lambdayHeikoS1: if we use job as a callable object, we can use std::packaged_task, functors, functions etc...18:57
lambdayin many cases making it a subclass of some generic job class would cause a lot of overhead18:58
lisitsynone great drawback of std threading stuff18:58
lisitsynin java you just extend it18:58
lambdaylisitsyn: what is it?18:58
lisitsynno way here18:58
lisitsynyou just either use it or fck you18:58
HeikoS1lambday: yeah an object might be better18:58
HeikoS1so its easier to use things18:58
HeikoS1I agree18:58
lambdaylisitsyn: we'll have to think of some trade off then :-/18:59
lambdayHeikoS1: I was thinking of separating the computation part and do the experiments... and then post it somewhere, so that we can discuss...18:59
lambdayHeikoS1: to do the changes in the existing framework18:59
lambdaylater we'll integrate19:00
lisitsynit would be cool to have it more implementation agnostic19:00
lisitsynI mean19:00
lisitsynstd::thread won't work with mpi19:00
lisitsynbut we can imagine thread is just mpi kind of thread19:00
lambdayummm.. std::thread on linux uses pthread, right?19:01
lisitsynlambday: should be19:04
lambdayfor a thread pool kind of scenario, we gotta spend some time on the job scheduling part... may be job queue won't be just a queue... but a priority queue or some other fancy DS19:04
lambdaycause not all jobs we'll be using would be independent19:04
lambdaybut initially a simple queue would do..19:04
lambdaylisitsyn: yeah I think it does..19:05
lisitsynlambday: well it should have some queue but shouldn't know what queue19:05
lambdayya... but there has to be some mechanism to specify our required computation order.... may be a DAG?19:06
lambdaythread pools would be computing different types of jobs19:07
lambdayI mean the threads in the pool19:07
lisitsynlambday: heh well yeah19:07
lisitsynit is the questiong19:07
lambdayya... somehow we gotta make it easy for the application programmer to just specify their required order... and submit different types of jobs to the engine... then just wait for all...19:09
lambdayideally it should be that simple... :-/19:09
shogun-buildbotbuild #48 of deb4 - python3 is complete: Failure [failed test python modular]  Build details are at  blamelist: Fernando Iglesias <>, Parijat Mazumdar <>19:09
lambdaybut anyway I will complete the current implementation with a simple queue (using std::queue) and then discuss may be?19:10
-!- thoralf [] has joined #shogun19:13
shogun-buildbotbuild #411 of precise - libshogun is complete: Success [build successful]  Build details are at
HeikoS1lambday, lisitsyn keep in mind the thing is called *independent jobs*19:18
HeikoS1so thats just embarassingly parallel stuff19:18
lambdayHeikoS1: well, that would make it pretty simple then... I was wondering if we can use it in some other cases.. when a bunch of independent jobs has to be followed by another bunch of independent jobs..19:19
lambdaybut initially that's too much to worry about :-/19:20
lambdayHeikoS1: things will work from modular interface as well.. we'll specify jobs from the modular interface as well.. engine will be there.. shouldn't matter what they use in the backend..19:22
HeikoS1lambday: yeah ordering would be cool19:23
HeikoS1lambday: but I think its kind of hard to specify that within the framwork19:23
HeikoS1why not write the code this way19:23
HeikoS1that uses the independent-for framework19:23
HeikoS1sorry for my long delays  here btw ;)19:23
lambdayHeikoS1: no worries man!..19:23
lambdayHeikoS1: well, currently what we had in mind can only compute one bunch of independent jobs.. and then wait for all...19:24
HeikoS1lambday: yes19:24
lambdayif we need another bunch... we'll have to create another engine..19:24
HeikoS1yep thats true19:24
HeikoS1ah you mean you want to combine things?19:24
lambdayHeikoS1: I was wondering if we can design it someway that we can use the same engine...19:24
HeikoS1lambday: but thats just queuing isnt it?19:25
HeikoS1since the engine has a certain capacity19:25
lambdayHeikoS1: but can we guarantee that a job from  job grup J1 {j11,j12...j1n} has computed *before* J2 {j21,j22,....j2n} in that case?19:26
lambdayI mean, say, if the result from job group J1 is used by the job group j219:27
HeikoS1lambday: I see19:27
HeikoS1currently this is only possible via wait_for_all19:27
lambdaybut then also, we cannot check before computing some job of the second batch if the required result has already been computed19:27
-!- parijat [671b082a@gateway/web/freenode/ip.] has joined #shogun19:27
lambdayso if we want that flexibility...19:28
HeikoS1lambday: I see19:28
HeikoS1I feel we should start with a replacement for the current parallel stuff19:28
HeikoS1which is all just parfor basically19:28
HeikoS1no dependencies19:29
HeikoS1we can always add that19:29
lambdayparfor as in openmp style?19:29
HeikoS1yeah just tasks that have independent parts19:29
HeikoS1to unify this within shogun would already be a great improvement19:30
HeikoS1lisitsyn: how would you represent categorical features in shogun?19:30
HeikoS1lisitsyn: as strings?19:30
lisitsynHeikoS1: I think map<int,string> + int19:31
-!- parijat [671b082a@gateway/web/freenode/ip.] has quit [Client Quit]19:32
lambdayHeikoS1: I'll check this then.. currently keeping in mind regarding the kernel matrix computation part... if I can make this example working.. then voila!19:33
lambdayHeikoS1: instead of putting things in thread_param and get_kernel_matrix_helper, we;ll specify it as job.. and inside the get_kernel_matrix, create jobs and submit to the engine..19:34
lambdaythis is what I had planned19:35
lambdaythe main code would be easy to read..19:35
lambdayplus memory things can be handled using std::ref and std::move ...19:36
lambdayHeikoS1: lisitsyn: will brb after dinner19:37
HeikoS1lambday: good plan19:39
HeikoS1yeah if this works, we can translate a few of shoguns parallel things to this much cleaner approach19:40
HeikoS1and if thats fine,  then we can start thinking about x-validation on a batch cluster backend like thing19:40
-!- travis-ci [] has joined #shogun19:49
travis-ci[travis-ci] it's Fernando Iglesias's turn to pay the next round of drinks for the massacre he caused in shogun-toolbox/shogun:
-!- travis-ci [] has left #shogun []19:49
-!- iglesiasg [] has quit [Quit: Leaving]20:01
-!- thoralf [] has quit [Ping timeout: 272 seconds]20:13
@wikinglambday: i've just suggested that maybe we should start moving on with the computing framework using spark...20:17
@wikinglambday: or at least do a mesos implementation of it20:17
lambdaywiking: spark.. okay20:26
lambdaywiking: so do you think it would be possible to come up with unified computation framework for multithread and cluster? like.. from an application programmer's point of view.. they will just have to use different engines... that's all20:28
@wikinglambday: well according to spark20:30
@wikingthat's doable20:30
@wikingcheck the "Interactive Use" part20:31
@wikingyou can either use spark like: MASTER=spark://IP:PORT ./pyspark20:31
lambdaywiking: checking20:31
@wikingMASTER=local[4] ./pyspark20:31
@wikingfor 'use four cores on the local machine'20:31
lambdayoh that's classic!20:32
@wikingso depending on the MASTER env variable20:32
@wikingu use various 'cluster' solutions20:32
@wikingwhich is what we want20:33
@wikingof course we'll have several drawbacks for this20:33
@wikinglike creating a parallel for loop with openmp is very easy20:33
@wiking#pragma etc...20:33
@wikingbut this way we'll have to implement our own way to do this20:34
@wikingto be able to handle both cases20:34
@wiking1) different cores but on one machine20:34
@wiking2) totally separated computing nodes20:34
@wikingso we'll have to have something like a function20:35
@wikingto do a parallel for loop20:35
@wikingand of course the biggest problem is that spark doesn't have c++ api :20:37
lambdaywiking: yeha....20:37
lambdaywiking: yeah I was just about to say that20:38
@wikingmesos does have20:38
lambdayeverything has to be on py or java.. :-/20:38
@wikinglambday: mesos is c++ implemented so we can use that20:38
@wikingbut then we loose all the good stuff of spark20:38
lambdaywiking: no way we can use it backwards? :P20:39
@wikinglambday: "If you want to pass data through an external C++ program, there's already an operation on Spark's distributed datasets (RDDs) called pipe(). You give it a command, and it will launch that process, pass each element of the dataset to its stdin, and return an RDD of strings representing the lines printed to the process's stdout."20:39
lambdayprovide some wrappers and use swig?20:39
@wikinglambday: well java has JNI20:39
@wikingso it's possible of course20:39
@wikingbut that's really a pain in the ass20:39
@wikingi've been working with JNI and it's really hell20:39
lambdaywiking: hmm... but the way you said it can both be used for multicore and cluster.. is really cool!..20:41
lambdayso for that much flexibility I wonder if we can take the pain20:42
lambdayI don't have much exp with JNI and not spark for that matter but I can check20:42
@wikingi'll have to check this pipedrdd20:42
@wikingwe could write the serializer for this :))))20:43
lambdaywiking: I'll check it out... play with it a bit20:44
lambdaylol python guide asks to read the scala part first :D20:45
@wikingmesos seems to be easy of course20:46
lambdaywiking: HeikoS1: I will come back in a few mins...20:57
@wikingkind of hard with the documentation21:07
lambdayHeikoS1: wiking back21:16
lambdaywiking: lol.. chinese?21:16
lambdaylol we shouldn't worry about that.. there are people among us who can help us with it :D21:16
lambdaywiking: would this help?
@wikingmmm well that's just how to run spark on mesos21:21
@wikingbut that doesn't help us with having spark21:21
-!- thoralf [] has joined #shogun21:23
-!- votjakovr [~votjakovr@] has quit [Quit: WeeChat 0.4.0]21:27
-!- shogun-notifier- [] has quit [Quit: transmission timeout]21:33
-!- iglesiasg [] has joined #shogun22:18
-!- mode/#shogun [+o iglesiasg] by ChanServ22:18
-!- lambday [67157c4d@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]22:33
-!- chetna [0e8b5206@gateway/web/freenode/ip.] has joined #shogun22:46
chetnaHello :) I am really enthusiastic to contribute in the field of machine learning and projects on similar lines. Going through the idea page for GSoC-2013 I found many ideas which are related to the coursework I have persuaded back in my college. I would like to know how can I start working and hacking into the codebase for Shogun.22:57
@iglesiasgchetna, Hey, I suggest you to have a look to the initiation issues in GitHub22:58
chetnaiglesiasg: Could you please direct me to the link for the same.22:59
@iglesiasgclick on the "entrance" label (initiation as I said before is not the right word)23:00
@iglesiasgyou will find the label on the left hand side of the page23:00
chetnaiglesiasg: Thanks :) I'll look through the issues with the above mentioned tag :)23:02
@iglesiasgchetna, make a comment in the issue if you decide to start working on one, just so we avoid wasting manpower replicating work23:04
chetnaiglesiasg: For Sure :)23:04
-!- chetna_ [0e8b5206@gateway/web/freenode/ip.] has joined #shogun23:15
-!- chetna [0e8b5206@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]23:17
-!- lisitsyn [~lisitsyn@] has quit [Ping timeout: 245 seconds]23:28
-!- chetna_ [0e8b5206@gateway/web/freenode/ip.] has quit [Ping timeout: 245 seconds]23:55
--- Log closed Thu Jan 30 00:00:16 2014