Monday, June 6, 2011

Found Some Data

I'm beginning to prepare for a new iteration of my data modeling class for the Fall semester.  I wanted to use a larger dataset than in the past to provide more meaningful queries for class assignments and exercises.  Today I made a step in the right direction.

I came across the Machine Learning Repository site of the University of California - Irvine.  This site houses several examples of datasets that can be used for statistical analysis or other data needs.  One of the datasets I found useful is the Dodger Loop Sensor dataset.  This data comes in two files.  One file contains the game information for each home game during the 2005 baseball season and the second file contains a count of the number of automobiles passing through an exit near Dodger Stadium.  I was able to break-up the game information into game and team tables and then use a third table to hold the traffic information.  Using these three tables, my students will be able to explore the data to find traffic patterns and associate them with the baseball games.

While this is far from perfect, 81 game records and 50,400 traffic records greatly exceed the small (18 records) hotel reservation database I created last year.  I think I will still use both databases but the new database will allow the students to explore the data more than the smaller database.

No comments:

Post a Comment

Skills to Look for in Project Managers

Today I read a brief article describing the eight skills to look for when hiring an IT project manager. The headlines caught my attention...