WoW Almanac

90 Night Elf Priest
12995
Hi all,

I've been working on a little WoW site for a while now, tapping into blizzard's API. I'm only really working on it with freetime I have, which lately with work and Diablo 3 is almost no time at all.

So instead of waiting to 'release' the site until I have implemented all the features, I'm going to release what I have now and tag it BETA, and add new features as I get time down the road. Why not right? (technically it should probably be alpha, but whatever).

I'm only collecting data for 3 realms right now. I've yet to receive an api key, so it's the best I can do for right now. It's ok though because I'm still ironing out various bugs and trying to implement more features.

NOTE: I am not a pro web developer, the site kind of looks crappy, but that will be fixed down the road. I'm more of a back end data developer, so front end design is something I suck at.

The technologies I'm using are:
python (for all data collecting, processing, and even the website)
django (for the website)
PostgresSQL database

The whole service is broken up into 3 'sections' if you will:
1. Data collection. I collect data for realms, and update characters / guilds data when it's been updated by the API. I do this on a sort of "schedule", so it's not updated right away, there will be lag from a character getting an achievement, and it appearing on WoW Almanac.

2. Data preparation. I take the data, and perform various grouping or calculations on it. For example: creating an ordered list of everyone on the realm, ordered by achievement points.

3. Present the processed data on the website. Self explanatory ^_^.

My site is http://www.wowalmanac.com/, and there are two features right now:

1. Realm Stats
This feature will be expanded a ton down the road, but for right now I cover some basic things: Top achievement points by character, top achievement points by guild, top pet collectors, top mount collectors.

In the future, I plan to add more categories, such as a list of people with the Insane title, or super rare mounts, as well as PvP stats.

I also plan on creating an image that will show your current rank on the realm for whatever category, so it can be used in forum signatures and whatnot. (For example, #1 on realm for achievement points). After all, you should be able to brag about it right?

2. Job Finder
This is kind of a stupid name, but I couldn't think of anything else. Basically this will let you find who on the realm can craft a specific item. For example, lets say I want to find someone on Perenolde - alliance side that can craft Endless Dream Walkers. I would enter that into the search, and would be directed here, http://www.wowalmanac.com/jobfinder/us/Perenolde/alliance/Endless%20Dream%20Walkers/

The Job Finder section needs a lot of work, but basic functionality is working now.

3. (Future release). I'm collecting auction house data too, the undermine journal (https://theunderminejournal.com/) already does an amazing job at this, so I may not want to re-invent the wheel. If I can think of anything I can create that the undermine journal doesn't do, then I'll work on that too.

In accordance with the API ToS, I've created a github repository with the function I use to connect to the api. It's a very simple function, and will be updated sometime to actually be a class instead of a function. I'm teaching myself Python through this process, so there will be evidence of that here. >.> <.<

https://github.com/BrianFehrle/WoW-Almanac-API

I started collecting this data for a couple of reasons. First, I was curious about how rare a specific mount was, stuff like that. But then I realized this data may be found interesting by other people also. Second, I want to improve my skills in the various technologies I used, such as python, Data Modeling, database development, django, etc... I've learned a lot in this process, so even if the site is stupid, I don't care, I got something great out of it.

Let me know what you think, if it sucks, something is broken, ideas for new features or pieces of data to present, anything. I'd love to hear your thoughts.

- Rukiak @ Perenolde
Edited by Rukiak on 6/3/2012 2:30 PM PDT
Reply Quote
1 Troll Rogue
0
06/03/2012 02:24 PMPosted by Rukiak
I'm only collecting data for 3 realms right now. I've yet to receive an api key, so it's the best I can do for right now. It's ok though because I'm still ironing out various bugs and trying to implement more features.


To get mount/pet/profession data, you'll have to query every character individually. If you want to update every character at least once per month, you're looking at over 600,000 requests per day per region, or 7 requests per second. I think the API key is typically limited to 50,000 requests per day. How will you handle so many requests, or will updates be less frequent than monthly, or..?

NOTE: I am not a pro web developer, the site kind of looks crappy, but that will be fixed down the road. I'm more of a back end data developer, so front end design is something I suck at.


I can totally relate. :) Aesthetic design is hard.

1. Realm Stats
This feature will be expanded a ton down the road, but for right now I cover some basic things: Top achievement points by character, top achievement points by guild, top pet collectors, top mount collectors.

Your character achievement rankings match what I have in Realm Pop, so that bit looks good. I thought about doing achievement ranks, but then figured that WoW-Achievements.com had that covered. Looks like their crawler is quite a bit delayed, since your stats and mine have higher achievement points than what they report.

In the future, I plan to add more categories, such as a list of people with the Insane title, or super rare mounts, as well as PvP stats.


That's potentially some very big data. If you were to track all achievements earned, for example, that's hundreds of millions of rows. Then add professions and individual mounts (instead of just mount counts) and that's a huge amount, too. I'd like to see how those queries return in a reasonable time. ;) Then again, you could run them ahead of time on a schedule, and just return static pages, like I do with Realm Pop. It takes about an hour to generate all the realms' pages.

2. Job Finder
This is kind of a stupid name, but I couldn't think of anything else. Basically this will let you find who on the realm can craft a specific item. For example, lets say I want to find someone on Perenolde - alliance side that can craft Endless Dream Walkers. I would enter that into the search, and would be directed here, http://www.wowalmanac.com/jobfinder/us/Perenolde/alliance/Endless%20Dream%20Walkers/


A website called FreeWithMats used to do this, before the API, and its users had to run an addon to update the website manually. I made a Greasemonkey script to mash it with Wowhead back then: http://everynothing.net/whfwm/

I've been waiting for someone to pick up the reins and try this out with the API. Good stuff.

3. (Future release). I'm collecting auction house data too, the undermine journal (https://theunderminejournal.com/) already does an amazing job at this, so I may not want to re-invent the wheel. If I can think of anything I can create that the undermine journal doesn't do, then I'll work on that too.


I'm continually surprised and challenged by what other folks do with the AH data. Looking forward to what you come up with. :)

I started collecting this data for a couple of reasons. First, I was curious about how rare a specific mount was, stuff like that. But then I realized this data may be found interesting by other people also. Second, I want to improve my skills in the various technologies I used, such as python, Data Modeling, database development, django, etc... I've learned a lot in this process, so even if the site is stupid, I don't care, I got something great out of it.


That's the best way to look at it. The API is a great resource to use something you're interested in (WoW) to learn how to do new stuff in app development. And if people enjoy using it, all the better.
Reply Quote
90 Night Elf Priest
12995

To get mount/pet/profession data, you'll have to query every character individually. If you want to update every character at least once per month, you're looking at over 600,000 requests per day per region, or 7 requests per second. I think the API key is typically limited to 50,000 requests per day. How will you handle so many requests, or will updates be less frequent than monthly, or..?


Yes, I anticipate this being pretty hefty. Right now I do a simple update where if it's been > X old, then update the character. However I'm working on an algorithm that will hopefully intellegently update characters that are active more often, and update rarely characters that are basically 'inactive'.

The algorithm will track when a character has actually been updated since the last time I checked (via the HTTP last modified header). I will have a numerical value for each character. If we pull the data and it has NOT been modified, I update our 'last time we checked' value and move on. If I pull data and it has indeed changed, then I do all my data updating, and I increase this counter by 1.

Once in a while, I'll globally decrease all values of this counter by 1. This means characters that are very rarely updated will have a low number, and characters that are updated regularly will have a high counter. With this, I can then say “update characters whit a counter greater than 7 once a week, update characters with a counter less than 7 once a month).


Your character achievement rankings match what I have in Realm Pop, so that bit looks good. I thought about doing achievement ranks, but then figured that WoW-Achievements.com had that covered. Looks like their crawler is quite a bit delayed, since your stats and mine have higher achievement points than what they report.


Yep, I remember finding wow-achievements.com and I assume that they, like so many other sites, only update a character upon request. Which means if you never update your character, nothing changes. I've always disliked that because you could easily be number 1 because the real number 1 was just lazy :-p.


In the future, I plan to add more categories, such as a list of people with the Insane title, or super rare mounts, as well as PvP stats.


That's potentially some very big data. If you were to track all achievements earned, for example, that's hundreds of millions of rows. Then add professions and individual mounts (instead of just mount counts) and that's a huge amount, too. I'd like to see how those queries return in a reasonable time. ;) Then again, you could run them ahead of time on a schedule, and just return static pages, like I do with Realm Pop. It takes about an hour to generate all the realms' pages.


Yep, it will be a ton of data. Right now, I'm not collecting all individual achievements, but I will be. I am however collecting every single pattern, mount, and pet that each character has. I could just do aggregate counts, but that means I can't do things like “who has the heroic lich king mount” types of queries, so I'm just grabbing everything.

For the 3 realms I'm collecting data for, my total data size is about 7GB (not including any auction data, that's a ton right there).

For this, my main database is a server at home, and it will process all my data on schedule, placing the 'processed' data into specific tables. Those tables are then replicated up to the database on the web box via the Slony replication tool (http://slony.info/). I'm building everything with database ETL warehousing in mind. What I hope to do is to set up various stored procedures and all that will update data on the fly as data changes, so that I don't need to do a 'update these tables once per hour' type of thing, but I'll need to make sure that doesn't slow down data ingestion too much. For the moment, everything in the realmstats section is updated once per hour from my main database, and those characters are updated once per week.

Thanks for your thoughts :-)
Edited by Rukiak on 6/4/2012 1:14 PM PDT
Reply Quote

Please report any Code of Conduct violations, including:

Threats of violence. We take these seriously and will alert the proper authorities.

Posts containing personal information about other players. This includes physical addresses, e-mail addresses, phone numbers, and inappropriate photos and/or videos.

Harassing or discriminatory language. This will not be tolerated.

Forums Code of Conduct

Report Post # written by

Reason
Explain (256 characters max)
Submit Cancel

Reported!

[Close]