Assignments
See schedule for the due date. The due time is 11:59pm on the due date. Submit all assignments on eLc (not e-mail). I will deduct points for submitting assignments late (10 percent of total points for each day, up to 2 days). Assignments submitted after 2 days of the due date and time will not be accepted (except for special accomodations).
Individual
id | topic | description | points |
a1 | SQL | Complete DataCamp's introduction to SQL module (4 hours) | 5 |
a2 | SQL | Complete DataCamp's intermediate SQL module (4 hours) | 5 |
a3 | analytics | Complete the tasks explained below (4 hours) | 5 |
To complete the required DataCamp modules (a1 and a2) you will need to:
- Sign up for an account using your full name and UGA e-mail address
- Join the MIST 4610 Summer 2020 team (an invite will be sent via Email on July 6)
For a3, use exploratory.io to:
- Import the energy stats dataset, plot the energy use of the five largest economies in the world for 1990 - 2016 (USA, China, Japan, Germany, and France), and create a map showing energy consumption for these countries in 2016. Which country consumed most energy in 2016? (descriptive analytics).
- Read the nyc bike counts dataset, create a pivot table about trips into and out of Manhattan by bridge and day of the week, and develop a linear regression model to predict trips each day of the week for each bridge. Do you have any concerns with the model? (explanatory analytics).
- Load the online retail dataset, compute the revenue of each sale, and forecast total weekly revenue for the next 10 weeks. Create a visualization showing both actual and forecasted values. Should you consider seasonality? (predictive analytics).
Publish your work on exploratory.io. Screenshot the images and submit a PDF file (on eLc) with the plot and map images for descriptive analytics; a pivot table, the linear regression model (with interpretation of significance & estimate), and a graph of the residuals for explanatory analytics; the 10 week forecast and a graph showing actual and forecasted values for predictive analytics.
Four short quizzes (q1-q4), each worth 5 points, with MC and T/F questions will be given on eLc.
Group
A state-of-the-art presentation (soa) is required from each group on a database technology not covered in the course, with a particular concentration in open-source solutions.
- You will record a 5-8 minute presentation that will be available to the class
- Points will be deducted for exceeding 8 minutes
- Focus on the applications of the tech and the opportunities it provides (see Neo4j)
- All team members must present
- For submission, provide a link or file to video and slides (on eLc)
- MapD
- MariaDB
- MongoDB
- CockroachDB
- MS Access
- PostgreSQL
- SQLite
- SAP HANA
- Cassandra
- GraphDB
A database analytics project is also required from each group. You will design a data model (at least 10 entities) to support a database of your choice. Review existing mobile applications that store data, such as Spotify and Walmart, choose a context that interests you, and exercise your creativity to determine the data model's entities. When reviewing an application think of potential improvements that could be made and incorporate that in your model. Consider designing a database for a local business or in a context where you can download historical data from - see Kaggle for ideas. Doing both (i.e., designing a database for a local business where historical data are available) would be ideal but difficult to accomplish. Because data modeling itself is also difficult a preliminary model can be submitted for review by July 19 via e-mail.
Once finalized, convert the data model to a relational database, populate the tables with real or simulated data (some groups in the past had thousands of observations for each table, others had hundreds, and some had only 50 or less), and write 10 queries to access the database. The queries should demonstrate your breadth of understanding of SQL (i.e., 10 simple queries will not score as well as say 4 simple queries and 6 nontrivial queries). Next, pose 3 questions of interest about the database, use exploratory.io to answer these questions, and create a report with the results.
When submitting the database analytics project, you should provide the following:
- Your team name and the names of its members
- A one paragraph description of the database
- The data model in png format (File > Export > Export as PNG…)
- 10 queries
- A natural language description of the queries
- The SQL and the results (Query -> Execute (All or Selection) to Text)
- The 10 queries should cover the following SQL features:
- multiple table join
- subquery
- correlated subquery
- GROUP BY and GROUP BY with HAVING
- ORDER BY
- divide
- IN or NOT IN
- A built-in function (e.g., AVG) or a calculated field
- REGEXP
- EXISTS or NOT EXISTS, other than divide
query 1 | query 2 | ... | |
multiple table join | X | subquery | X | X |
... | X |
- 3 questions and answers
- A natural language description of the question and answer
- Findings (e.g., visualizations) from exploratory.io
- The analytics in exploratory.io should be at least descriptive
- Copy and paste everything into a single Word document
- Save as PDF and submit to the assignment dropbox on eLc as the attachment