Thursday, February 08, 2007

Crystal Ball

okie folks...the cricket world cup is a month away...i was looking at cricinfo today and just realized that there is a wealth of data available for download...r u the the betting type??? was wondering if there are any takers to look at the data and kind of snoop to see if we can get some sort of predictions...if there are any takers do post here...mebbe we can run a blog or something with what we come up with...if we can come up with a model, i will take the responsibility of maintaining and updating the prediction system...

i dont have anything particular in mind...what we have is the data...what we need is ideas....any takers?????

9 comments:

Suresh Sankaralingam said...

I think this will need a lot more thinking...:)... Well, if I understand you correctly, are you planning to create a model based on team-performance/player statistics/... and so on to predict each match's outcomes and bet accordingly? When you mean updating and maintaining, I hope you are talking about the information from earlier matches in world-cup being fed to your model...

If all I am saying above is correct, what is the information we are looking for? Is there a dilemma in terms of the variables and parameters that one needs to choose to feed to the model?

Mad Max said...

@ Mindframes: uve hit the nail on the head...we need to think about this hard...

well we can take it to any extent but I was thinking more about trying to forecast the performance of the indian team and maybe even individual players...for instance consider the case of Robin Uthappa..the guy has scored tons of runs in the domestic circuit but can he reproduce that in international circuit...we could think of player specific models since we have statistics such as runs scored, balls faced etc. we also have information on categorical variables like right/left and we can construct variables such as attacking/defensive etc etc depending on how we want to model..

of course i'm sure it will be a very crude system in the beginning, but i'm kind of thinking of a model..if we are to think of it in a regression setting, the obvious choice is a poisson regression model (the easiest to implement). so we can have say a dependent variable like the expectation on runs scored as a function of some variables which we can feed in.

now the difficulties arise when we want to model conditional performance. for instance performance variables cound potentially change across teams, across pitch type, across day/night fixture, across place in the batting order, across batting first/second etc etc.

so basically what I have in mind is

a) decide do we do a team or a player model. Team would be my preference because player will be harder to maintain

b) choose the variables which could potentially explain performance of a team. this being a brainstorming session we can throw in as many variables into the mix

c) once we have a comprehensive list, let us take a detailed look at the cricinfo database to see what information is available

d) once we have this information we can decide on what modeling strategy to adopt.

e) then it is a matter of implementation and updating based on results of individual games

what say folks?

Suresh Sankaralingam said...

My regression and modelling is limited to using MS-Excel which I did as part of a course work. All we did was to look at all the variables and remove one by one based on their contribution to the model (coefficients, linear, log-linear etc.,) and finally arrive at an equation which best fit the model. I thought Excel was very powerful for such things. Is it the same modelling s/w or approach you use?

I found an article by googling. Apparently, it is used to estimate the winner based on overs left, wickets left, past performances and so on. You can read it in the following website. Might shed more light...

www.jssm.org/vol5/n4/2/v5n4-2pdf.pdf

Mad Max said...

@ Mindframes: thanks for the link will read it..

As for software...no i dont use excel for professional work..it is not reliable in terms of algorithms to estimate the test statistics (t, F etc)..plus it has a major limitation that it can only accomodate the least squares regression type.

for my work the main software package i use is R (which is open source)...any software which allows easy vectorization works great for statistics...other software which i like to use depending on applications are Matlab, Gauss and SAS...

I dont know how to code in C++...planning to take a class sometime this year becoz computing speed increases dramatically (that is the claim)..i guess it depends on how efficiently one writes the code...but neways no harm trying...hehehe

Suresh Sankaralingam said...

Well if you have a full-fledged package that does what you want, it is fine. Unless you want to try out small experiments, I am not sure if it worth the time to write a complete s/w package in C++.

I am not sure if I will be of any use since you are a statistics expert (other than giving moral support and betting money on your model..). But, if you need any specific thing that you want to try out, we can work together. I can code C++...;)

Mad Max said...

@ mindframes: no i was not planning on C++ for this idea..this is just for fun right...but just that it might be helpful for me from a professional perspective to learn how to do stuff in C++...

i read the article that you had mentioned..it is interesting though i would tend to a little skeptical about the results...lets think about it a bit more..

how about this...why not pen down what factors u think will affect a game...lets just hit the variables first..we will think about the modeling aspect later...anything that comes to mind is fine

Suresh Sankaralingam said...

sounds good...

sdpal said...

You guys could've talked over phone..

Manohar said...

@sdpal: I agree- that was a phone conversation there......
:)