Advanced Databases
Programmes: MSc Advanced Computer Science / MSc Data Science
Module Code: KL7011
Module Title: Advanced Databases
Distributed on: Monday 7th December 2020
Submission Time and Date: 16 January 2021 by 18:00 GMT
Date by which Work and Feedback will be returned to Students: 23rd February 2021
Weighting This coursework accounts for 40% of the total marks for this module
Group Work This assessment is designed to be undertaken by a group comprising THREE students
Submission of Assessment Electronic Management of Assessment (EMA): I need help writing my essay – research paper note your assignment is to be submitted electronically and it will be submitted online on Blackboard It is your responsibility to ensure that your assignment is submitted before the submission deadline stated above. See the University policy on late submission of work.
Instructions on Assessment:

You are expected to produce a word-processed answer to this assignment. I need help writing my essay – research paper use Arial font and a font size of 12. For SQL code and output, you can use courier new font, which preserves SQL format and layout.
You are required to use the Ace my homework – Write my essay – Harvard Style of referencing and citation. The “Cite them right” guide is recommended for referencing and citation (Pears and Shields, 2008 – Affordable Custom Essay Writing Service | Write My Essay from Pro Writers) which should be followed throughout your work where needed. I need help writing my essay – research paper do not include references to lecture notes.
AssignmentTutorOnline

Mapping to Programme Goals and Objectives

This assignment covers the following learning outcomes for the module:

Knowledge & Understanding:

2. Demonstrate deep knowledge of key concepts of data warehousing, data analytics, data standards, and data quality
Intellectual / Professional skills & abilities:

4. Appraise, analyse, design, develop and evaluate data warehousing and data analytics solutions using Oracle database system
Personal Values Attributes (Global / Cultural awareness, Ethics, Curiosity) (PVA):
5. Develop critical awareness of the responsibilities of database developer with respect to professional, legal, security and ethical issues individually or as part of a team

Assessment Regulations

You are advised to read the guidance for students regarding assessment policies (Northumbria, 2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers). They are available online here.

Late submission of work

Where coursework is submitted late without approval, after the published hand-in deadline, the following penalties will apply.

For coursework submitted up to 1 working day (24 hours) after the published hand-in deadline without approval, 10% of the total marks available for the assessment (i.e.100%) shall be deducted from the assessment mark.

Coursework submitted more than 1 working day (24 hours) after the published hand-in deadline without approval will be regarded as not having been completed. A mark of zero will be awarded for the assessment and the module will be failed, irrespective of the overall module mark.

These provisions apply to all assessments, including those assessed on a Pass/Fail basis.

The full policy can be found here.

Students must retain an electronic copy of this assignment (including ALL appendices) and it must be made available within 24hours of them requesting it be re-submitted.

Academic Misconduct

The Assessment Regulations for Taught Awards (ARTA) contain the Regulations and procedures applying to cheating, plagiarism and other forms of academic misconduct.

The full policy is available at here

You are reminded that plagiarism, collusion and other forms of academic misconduct as referred to in the Academic Misconduct procedure of the assessment regulations, which are taken very seriously. Assignments in which evidence of plagiarism or other forms of academic misconduct is found may receive a mark of zero.

Criteria for success:

For textual components :
80-100% – The description will excellently cover all the specific topics requested. The written work will be fluent, clearly presented and of out-standing quality.

70-79% – The description will comprehensively cover all the specific topics requested. The written work will be fluent and clearly presented and of distinctive quality.

60-69% – The student will show a very good knowledge of the specific topics, with very good presentation skills and quality.

50-59% – The student will show an above average knowledge of the specific topics, with above average presentation skills and quality.

40-49% – There will be an inadequate description of a significant proportion of the topics requested. There will be no major failures in presentation clarity though partly inadequate.

Less than 40% – There will be little or no information conveyed in an intelligible manner on the specific topics requested.

(e.g., following sound algorithms, standards, methods, error free SQL code),

For SQL and other database technical components:
80-100% – The students will produce exceptional models and solutions, and will demonstrate the use of notation/language, which have outstanding syntactic accuracy (e.g., following sound algorithms, standards, methods, error free SQL code) with exceptional semantic relevance (e.g., are relevant to the requirements of the particular scenario).

70-79% – The students will produce fully complete models and solutions, and will demonstrate the use of notation/language, which have high syntactic accuracy, with high semantic relevance.

60-69% – The students will produce almost models and solutions, and will demonstrate the use of notation/language, which have appropriate syntactic accuracy with reasonably well semantic relevance.

50-59% – The students will produce fairly complete models and solutions, and will demonstrate the use of notation/language, which have adequate syntactic accuracy with reasonable semantic relevance.

40-49% – The students will produce models and solutions, and will demonstrate the use of notation/language, which have some syntactic accuracy and semantic relevance but on balance inadequate as a whole.

Less than 40% – The students will not produce sufficient models and solutions, and/or will be unable to demonstrate the use of notation/language with significant syntactic accuracy and/or significant semantic relevance.

Assessment Background and Scenario

This assessment is based on two scenarios as follows:

The Sales History (SH) Data Warehouse scenario. SH is a sample database schema provided by Oracle, which has been extensively used in the Oracle’s Data Warehousing Guide (Lane, 2013). The details of this scenario are provided in Appendix 1.
UniTel’s Customer Churn dataset described in Appendix 2.
Assignment Questions

Part 1: Data Warehousing Tasks (50 Marks)

This part is based on the Sales History scenario as described in Appendix 1.

You must submit all the SQL queries and any other code that you wrote in answering any of the tasks / questions (e.g., the use of Explain Plan statements for the queries and their outputs using Spooling or other suitable means).

Study the index definitions in sh_idx.sql. Ace my homework – Write my paper – Online assignment help tutors – Discuss in detail (using cost-based analysis) why these indexes (choose three different ones) are useful for answering queries over the SH2 and DWU versions of the database. You should not run the sh_idx.sql script at all.
(9 marks)

Identify three new indexes and justify why they could be useful. Ace my homework – Write the SQL code for creating these indexes under your DWU account. Give example queries with cost-based analysis for both DWU account (which will have the new indexes) and SH2 shared schema (which will NOT have any of your new indexes). Alternatively, you may choose to run the same queries on your DWU account before and after creating your proposed three indexes.
(9 marks)

Given the two materialized views (MVs) defined in sh_cremv.sql, discuss in detail why these MVs are useful for users of the SH database. You should provide detailed examples of cost based analysis, e.g., using Explain Plan for running sample queries on both SH2 and DWU to illustrate your answer. You should not run the sh_cremv.sql script at all.
(8 marks)

Identify three new MVs based on the base tables in the SH schema under your DWU account and justify why they would be useful for the users of your data warehouse. Ace my homework – Write the SQL code for creating these MVs. Moreover, run sample queries on both SH2 and DWU to ensure that queries running on DWU will be re-written by Oracle to use your proposed three MVs instead of the base tables used in the sample queries. Note that you must not query your MVs directly in the FROM clause; let the Oracle Query Optimizer re-write the queries and answer them using your proposed MVs.
(12 marks)

Prior to the introduction of the special aggregation function CUBE, there was no possibility to express an aggregation over different levels within a single SQL statement without using the set operation UNION ALL. Every different aggregation level needed its own SQL aggregation expression, operating on the exact same data set n times, once for each of the n different aggregation levels. With the introduction of CUBE in the recent database systems, Oracle provided a single SQL command for handling the aggregation over different levels within a single SQL statement, not only improving the runtime of this operation but also reducing the number of internal operations necessary to run the query and reducing the workload on the system.
Using CUBE, write an SQL query over the SH schema under your DWU account involving one fact table (SALES or COSTS) and at least two dimension tables and at least 3 grouping attributes. Provide output of successful execution of your query. Provide reasons why your query may be useful for users of the SH data warehouse.
(3 marks)

Using set operation UNION ALL (and not CUBE), write an SQL query that produces the same result as the query in (a) above. Provide output of successful execution of your query.
(5 marks)

Using EXPLAIN PLAN, provide a detailed discussion analysing costs of evaluating the above queries (i.e. with and without ROLLUP).
(4 marks)

Part 2: Data Mining Tasks (35 Marks)

This part is based on the UniTel scenario as described in Appendix 2. Moreover, you must use the DMUn Oracle Data Mining Account (where 1 <= n <= 75, e.g., DMU1, DMU2) allocated to your group.

Jessica is the customers relation manager at UniTel. She wants to know the possibility of potential churn of the company’s customers based on previous experience, so she may be able take some actions accordingly to retain their customers.

To help Jessica in doing her analysis, we need to investigate what could be a suitable algorithm for solving her problem. The data from last year are used as the training data and the data of February of this year are taken as the testing data to verify the model accuracy. Data of all the columns are used to set up the model. To meet the requirement, many algorithms can be selected.

Oracle Data Mining (ODM) provides the following algorithms for classification:

Decision Tree
Naive Bayes
Generalized Linear Models (GLM)
Support Vector Machines (SVM)
You are required to perform the following tasks:

Using PL/SQL API, SQL-Developer’s Data Miner Workflows or RODM (R package for Interfacing ODM), develop at least TWO models based on the above algorithms for the dataset accessible as CUSTOMERCHURN table (15 marks)
Evaluate capabilities of the models you have developed (10 marks)
Present and discuss your findings and make recommendations for Jessica (10 marks)
Part 3 (15 marks)

Critically evaluate the SH data warehouse and the UniTel’s Customer Churn data in relation to theory and best practices of data quality and standards.

The report should be concise and comprehensive and in the region of 900-1000 words. You should use Ace my homework – Write my essay – Harvard style of citation and referencing by following the guidelines in Pears and Shields (2008 – Affordable Custom Essay Writing Service | Write My Essay from Pro Writers).

Hand-in procedure: You should hand in your answer for this assignment as a single word processed document to be electronically submitted on the Blackboard. A sample assignment template file will be uploaded on the Blackboard near the submission deadline.

Appendix 1
The Sales History (SH) Data Warehouse

SH is a sample database schema provided by Oracle, which has been extensively used in the Oracle’s Data Warehousing Guide (Potineni, 2017). The SH schema, as shown in Figure 1, consists of a big fact table, SALES, and five relatively small dimension tables: TIMES, PROMOTIONS, CHANNELS, PRODUCTS and CUSTOMERS. The additional COUNTRIES table linked to CUSTOMERS creates a simple snowflake. The model and the attributes are aimed at demonstrating data warehousing functionality like star transformation and query rewrite. They do not necessarily represent the optimal approach for this kind of data warehouse in real productive environments; and such a design would be driven more by business requirements than by the star itself. Table 1 shows the cardinalities of the tables in the SH schema.

Table Number of Rows
Customers 50,000
Countries 19
Products 10,000
Sales 10,16,271
Costs 7,87,766
Times 1,461
Promotions 501
Channels 5
Table 1: Cardinalities of SH tables (as per SH2 implementation).

Figure 1: SH Star Schema.

Usually, a star schema includes a single fact table and few dimensional tables, however, the SH schema designer identified another useful fact table called COSTS, which is linked to the TIMES and PRODUCTS dimensional tables. Figure 3 shows two fact tables: SALES and COSTS.

Two versions of SH schema/database

The default and optimized version of the SH database is created under the SH2 user. You have read-only access to query any of the tables of SH database by prefixing any table or view name by “SH2.”, e.g., to query the contents of CHANNELS table, use:
SELECT * FROM SH2.CHANNELS;

An un-optimized version of the SH database has been created under your own username DWU. You will be given a special username and password for doing this part of the assignment (i.e. an Oracle account starting with DWU and followed by a number e.g. DWU1, DWU2, etc). Throughout this part, you MUST use this special username.
Important Note

Whenever, you need to assess the performance of existing database structures (e.g., materialized view, index) you must refer to SH2 as above (using the “SH2.” prefix). Any new database structures that you need to create should be created under your username. Likewise, when you need to assess the performance of any new database structures that you create under your DWU username, you need to prefix all tables/views/materialised views accordingly, e.g., to query the contents of your CHANNELS table, use:

SELECT * FROM DWUn.CHANNELS;

You should replace n by your specific number allocated to you, e.g., DWU1, DWU2, etc.

Appendix 2

UniTel – A Telecom Company’s Customers Data

Customer churn or customer attrition is a phenomenon that occurs when customers or subscribers break their relationship with a company or service (https://en.wikipedia.org/wiki/Customer_attrition). Churn has direct impact on a company’s profitability. One industry in which customer churn is widespread, is the telecommunications industry. Therefore, telecom companies are interested to know more about customer behaviours in order to gain more profit. In addition, analysis of customer churn problem helps to identify and improve other aspects of customer services.

A fictitious telecom company, UniTel (Universal Telecommunications) is concerned about a recent reduction of its customer base. When a customer ends their relationship with the company, it is one of the most basic factors in determining their revenue. They need to know which of their customers are loyal and which are at risk of churning, so they need to know the factors that affect customers’ decisions of leaving the company.

UniTel has produced a dataset about their customers. Each tuple/row corresponds to a single customer and it records various demographic and service usage information. There are a total of 7043 customers (with unique customer ids from 1 to 7043). The data contains 20 columns, e.g., Customers’ subscription to the services (phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies); Account information of each customer (contract, paperless billing, monthly charges, and total charges); and demographic information (gender, whether senior citizen, and if they have partners and dependents). The column “churn” in the dataset shows the information as to whether the customer churned or not. The value “0.0” means we still kept the customer; otherwise, the value is “1.0”, which means the customer has left the company.

A table called CUSTOMERCHURN is available as a shared table in the Oracle database. Number of Rows: 7043, Number of Columns: 20

First Six Customers Data

Bottom Six Customers Data

References & Help write my assignment – Bibliography

Northumbria (2019: 2024 – Online Assignment Homework Writing Help Service By Expert Research Writers) Academic Regulations for Taught Awards. Available at: https://www.northumbria.ac.uk/about-us/university-services/academic-registry/quality-and-teaching-excellence/assessment/guidance-for-students (Accessed: 4 October 2020).

Pears, R. and Shields, G. (2008 – Affordable Custom Essay Writing Service | Write My Essay from Pro Writers) Cite them right: the essential referencing guide.
Newcastle upon Tyne: Pear Tree Books. Available at: https://www.citethemrightonline.com/ (Accessed: 4 October 2020).

Potineni, P. (2017) Oracle Database Data Warehousing Guide, 12c Release 1 (12.1). Part Number E41670-11. Available at: https://docs.oracle.com/database/121/DWHSG/ (Accessed: 17 November 2020).

Surampudi, S. (2017a) Data Mining User’s Guide, 12c Release 1 (12.1). Part Number E53115-05. Available at: https://docs.oracle.com/database/121/DMPRG/toc.htm (Accessed: 17 November 2020).

Surampudi, S. (2017b) Oracle Data Mining Concepts, 12c Release 1 (12.1). Part Number E17692-19. Available at: https://docs.oracle.com/database/121/DMCON/toc.htm (Accessed: 17 November 2020).

Databases for Advanced Users

MSc Advanced Computer Science / MSc Data Science Module Code: KL7011 Programs: MSc Advanced Computer Science / MSc Data Science

Advanced Databases is the title of the module.

On Monday, December 7th, 2020, it will be distributed.

Submission Time and Date: 16 January 2021 by 18:00 GMT

Date by which Work and Feedback will be returned to Students: 23rd February 2021

Weighting

This coursework accounts for 40% of the total marks for this module

Group Work

This assessment is designed to be undertaken by a group comprising THREE students

Submission of Assessment

Electronic Management of Assessment (EMA): I need help writing my essay – research paper note your assignment is to be submitted electronically and it will be submitted online on Blackboard It is your responsibility to ensure that your assignment is submitted before the submission deadline stated above. See the University policy on late submission of

Published by
Research Helper
View all posts