Get the Desktop App for Battle.net Now
- All your games in 1 place
- Log in once
- Automatic game updates
I'm an aspiring "Data Scientist", and I'd love to cut my teeth on some data that I'm highly interested in. Primarily, I'd like to develop some predictive models, and some clusters as well. Is there any publicly available data around this? If not, are there any background processes that I could run to capture my own behavior?
There are many good tools online for getting started here. If you want to dive right in, a decent place to start is http://mahout.apache.org/ Many sample data sets are available to play with at http://archive.ics.uci.edu/ml/
Otherwise, start by brushing up on your basic statistics -- std dev, correlation, variance, confidence, histograms, etc. Two really good entry level books to whet your appetite:
"Data Mining: Concepts and Techniques" (Jiawei Han, Micheline Kamber, Jian Pei)
"Data Analysis with Open Source Tools" (Philipp Janert)
This game would be a very interesting data mine, the psychological behavior of people playing; their habits and opinions could fill a several doctoral thesis. It reminds me of Plato in a way. Video games and their affect on the psychology of individuals and society. A correlation would need to be made between television and video games...brain scans, affects on the nervous system and development of brain tissue. I bet you could spend a hundred years on the subject.
Edited by CuriousDecoy#1997 on 6/12/2012 4:42 AM PDT
did you know that the brain runs on sugar and oxygen?
You know as a child you like to look at eccentric patters/colors such as a neon key chain or a checkerboard and the blurred vision that the child has gets better through rearranging optical nerve connections to make better ones and evolving and making the vision through the eyes less blurry, the same can be said for the brain, when the brain is stimulated through puzzle solving (video games) regardless of what the puzzle is or how its meant to be solved an obstacle is an obstacle the brains cranial nerves do the same making better connections increasing iq but at a slow pace of course... ur not gonna play tetris once and gain 20 iq.... /cough
however, im not too sure data mining a game will provide sources of variables for your thesis... playing the game sure, knowing the game mechanics so you can hack it mmmm not so much. Maybe u can share some of your findings? would be interesting, i would recommend studying the human behaver (playing the game) first before i would even consider data mining which is irrelevant imo but hey everyone says im just a dumb@#$ what do i know?
i would consider doing long term observation of a cavia >.> as subconsciousness could (should) be a strong variable. As for watching yourself try a webcam or put cameras in your house to watch yourself it would also help to ask others (spouse) to take notes especially on moods... im sure u know all of this already...
You could probably write a program to poll the D3 servers for hours played, achievements, items, and other publicly available player information, though I suspect you won't find anything groundbreaking there, and it's probably a violation of the EULA.
Assuming they become available online it becomes easier to collect the data, but see the comment about probably nothing interesting. Could be fun to represent the data visually in nice ways, but for what you're after you'll probably just find that they're all fairly directly related, just as one would intuit.
For large interesting datasets try http://www.sigkdd.org/kddcup/index.php which has a variety of past data sets from comps.
http://data.nasa.gov/ has some cool stuff as well.
*edit*Really? I can't write scra pe because it has a bad word in it? And here I thought language filters had evolved past 90's era naive search and replace.*
Edited by Jemima#6366 on 6/17/2012 12:54 AM PDT
Threats of violence. We take these seriously and will alert the proper authorities.
Posts containing personal information about other players. This includes physical addresses, e-mail addresses, phone numbers, and inappropriate photos and/or videos.
Harassing or discriminatory language. This will not be tolerated.