SQL Practice Queries
Database Name: EngineeringDB
Database Schema Overview
Students Table
student_id (int): Unique student ID
name (varchar(50)): Student name
department (varchar(50)): Department (e.g., CSE, ECE, MECH)
year (int): Year of study (e.g., 1, 2, 3, 4)
Courses Table
course_id (int): Unique course ID
course_name (varchar(100)): Course name
department (varchar(50)): Department offering the course
credits (int): Course credits
Grades Table
student_id (int): FK referencing Students
course_id (int): FK referencing Courses
grade (varchar(2)): Grade obtained (e.g., A, B, C, D, F)
Q1: Retrieve the names of all students in the CSE department.
SELECT name FROM Students WHERE department = 'CSE';
Q2: Find the course names and credits for all courses offered by the ECE department.
SELECT course_name, credits FROM Courses WHERE department = 'ECE';
Q3: Get the student IDs and names of students who have scored an A grade in any course.
SELECT S.student_id, S.name FROM Students S JOIN Grades G ON S.student_id = G.student_id WHERE G.grade = 'A';
Q4: List the course IDs and names of courses that have more than 3 credits.
SELECT course_id, course_name FROM Courses WHERE credits > 3;
Q5: Find the average grade of students in the MECH department.
(Assume grade points: A=10, B=8, C=6, D=4, F=0)
SELECT AVG(CASE G.grade WHEN 'A' THEN 10 WHEN 'B' THEN 8 WHEN 'C' THEN 6 WHEN 'D' THEN 4 WHEN 'F' THEN 0 END) AS average_grade FROM Students S JOIN Grades G ON S.student_id = G.student_id WHERE S.department = 'MECH';
Q6: Retrieve the student IDs and names of students who have not scored any grade.
SELECT student_id, name FROM Students WHERE student_id NOT IN (SELECT student_id FROM Grades);
Q7: Find the course IDs and names of courses that have been taken by more than 5 students.
SELECT C.course_id, C.course_name FROM Courses C JOIN Grades G ON C.course_id = G.course_id GROUP BY C.course_id, C.course_name HAVING COUNT(DISTINCT G.student_id) > 5;
Q8: Get the department-wise count of students.
SELECT department, COUNT(*) AS student_count FROM Students GROUP BY department;
Q9: List the student IDs and names of students who have scored an A grade in a course offered by the CSE department.
SELECT S.student_id, S.name FROM Students S JOIN Grades G ON S.student_id = G.student_id JOIN Courses C ON G.course_id = C.course_id WHERE G.grade = 'A' AND C.department = 'CSE';
Q10: Find the course IDs and names of courses that have the highest credits.
SELECT course_id, course_name FROM Courses WHERE credits = (SELECT MAX(credits) FROM Courses);
💻 Extra SQL Practice Problems – Q&A Format
🔹 Database: mxsales
Table: retail_sales_feb
Q1. How many customers bought a dartboard?
SELECT COUNT(*) FROM m0sales.retail_sales_feb WHERE product = 'Dartboard';
Answer: 3
Q2. How many customers bought a treadmill?
SELECT COUNT(*) FROM m0sales.retail_sales_feb WHERE product = 'Treadmill';
Answer: 7
Q3. How many customers bought something from the Computer department?
SELECT COUNT(DISTINCT customer) FROM m0sales.retail_sales_feb WHERE department = 'Computer';
Answer: 8
Q4. What were the names of customers who bought an Organizer?
SELECT DISTINCT Customer FROM m0sales.retail_sales_feb WHERE product='Organizer';
Answer:
Smith Ben
Martin Dan
Jones Mathew
Rose Clair
Lopez Maria
Q5. What are the names of customers who had Dakic as a sales rep?
SELECT DISTINCT Customer FROM m0sales.retail_sales_feb WHERE sales_rep='Dakic';
Answer:
Sweet Mary
Rose Clair
Martin Dan
Jovanovic Robert
Smith Ben
Q6. Which sales rep had the highest total sales?
SELECT Sales_Rep, SUM(Total) FROM m0sales.retail_sales_feb GROUP BY Sales_Rep ORDER BY SUM(Total) DESC;
Answer (Top 3):
Jenkin: 9158.00
Dakic: 7874.70
Salt: 6429.85
Q7. Which sales rep sold the most laser printers?
SELECT Sales_Rep, SUM(Quantity) FROM m0sales.retail_sales_feb WHERE Product='Laser Printer' GROUP BY Sales_Rep ORDER BY SUM(Quantity) DESC;
Answer:
Dakic: 14
Jenkin: 9
Salt: 3
Q8. How many total items were sold overall?
SELECT SUM(Quantity) FROM m0sales.retail_sales_feb;
Answer: 321
Q9. What is the least expensive product in the database?
SELECT Product FROM m0sales.retail_sales_feb WHERE Price = (SELECT MIN(Price) FROM m0sales.retail_sales_feb);
Answer: Mouse Mat
Q10. What is the name of the customer who had Sheedy as a sales rep and bought a Color Scanner?
SELECT Customer FROM m0sales.retail_sales_feb WHERE Sales_Rep='Sheedy' AND Product='Color Scanner';
Answer: Smith Ben
Q11. Which customers bought products more than $1000 and what were the names of those products?
SELECT Customer, Product FROM m0sales.retail_sales_feb WHERE Price > 1000;
Answer:
Nelson Mary – Walkman
Dinh Tran – Walkman
Rose Clair – Digital Camera
🔹 Database: moviedb
Q12. Which customer lives at the address “1531 Sal Drive?”
SELECT customer.first_name, customer.last_name FROM moviedb.customer, moviedb.address WHERE customer.address_id = address.address_id AND address='1531 Sal Drive';
Answer: DOROTHY TAYLOR
Q13. How many customers live in Canada?
SELECT COUNT(*) FROM moviedb.customer, moviedb.address, moviedb.city, moviedb.country WHERE customer.address_id = address.address_id AND address.city_id = city.city_id AND city.country_id = country.country_id AND country.country='Canada';
Answer: 5
Q14. Which country has the most customers?
SELECT Country, COUNT(*) FROM moviedb.customer, moviedb.address, moviedb.city, moviedb.country WHERE customer.address_id = address.address_id AND address.city_id = city.city_id AND city.country_id = country.country_id GROUP BY Country ORDER BY COUNT(*) DESC;
Answer: India – 60 customers
Q15. What is the postal code of Cape Coral?
SELECT postal_code FROM moviedb.address, moviedb.city WHERE address.city_id = city.city_id AND city.city='Cape Coral';
Answer: 31342
Q16. What is the largest single payment made by Emily Diaz?
SELECT payment.amount FROM moviedb.payment, moviedb.rental, moviedb.customer WHERE payment.rental_id=rental.rental_id AND rental.customer_id=customer.customer_id AND customer.first_name='Emily' AND customer.last_name='Diaz' AND payment.amount = ( SELECT MAX(payment.amount) FROM moviedb.payment, moviedb.rental, moviedb.customer WHERE payment.rental_id=rental.rental_id AND rental.customer_id=customer.customer_id AND customer.first_name='Emily' AND customer.last_name='Diaz' );
Answer: 8.99
Q17. Which customers made the largest single payment in India? What was the amount?
SELECT payment.amount, customer.first_name, customer.last_name FROM moviedb.payment, moviedb.rental, moviedb.customer, moviedb.address, moviedb.city, moviedb.country WHERE payment.rental_id=rental.rental_id AND rental.customer_id=customer.customer_id AND customer.address_id=address.address_id AND address.city_id=city.city_id AND city.country_id=country.country_id AND country.country='India' AND payment.amount = ( SELECT MAX(payment.amount) FROM moviedb.payment, moviedb.rental, moviedb.customer, moviedb.address, moviedb.city, moviedb.country WHERE payment.rental_id=rental.rental_id AND rental.customer_id=customer.customer_id AND customer.address_id=address.address_id AND address.city_id=city.city_id AND city.country_id=country.country_id AND country.country='India' );
Answer (Customers with payment of 10.99):
Adam Gooch
Sherri Rhodes
Erik Guillen
Tim Cary
Lena Jensen
Anita Morales
John Farnsworth
Bradley Motley (x2)
Lori Wood
Joshua Mark
Nancy Thomas
Module 4
Q1. Explain with example, the DDL commands CREATE, ALTER and DROP.
CREATE: The CREATE command is used to define new database objects like tables. It specifies column names, data types, and constraints.
Example (Page 2 & 5):
CREATE TABLE Emp ( Emp_ID int NOT NULL, Name varchar(20) NOT NULL, Age int, Address varchar(100) DEFAULT 'INDIA', PRIMARY KEY(Emp_ID) );
This creates the Emp table with columns and constraints.
ALTER: The ALTER command modifies an existing table structure, such as adding or dropping columns.
Example (Page 6):
ALTER TABLE Emp ADD Email varchar(100);
This adds an Email column to the Emp table.
DROP: The DROP command deletes an entire table or database object permanently.
Example (General Knowledge):
DROP TABLE Emp;
This removes the Emp table and all its data.
Q2. How are aliases used in SQL queries? What is the purpose of using an alias?
Aliases are temporary names assigned to columns or tables in SQL queries for better readability and simplicity.
1.
Column Alias: Renames a column in the result set for clarity.
SELECT department, CONCAT(name, ' has id-', id) AS employee_code FROM employee;
Here, employee_code is an alias for the concatenated result.
1.
Table Alias: Shortens table names, especially in joins.
SELECT e.id, e.name, d.dept_name FROM employee e INNER JOIN department d ON e.dept_id = d.dept_id;
e and d are aliases for employee and department.
Purpose:
Simplifies complex queries.
Avoids ambiguity in multi-table joins.
Improves output readability.
Q3. What is the Left Outer Join, Right Outer Join and Full Outer Join?
1.
Left Outer Join: Returns all rows from the left table and matched rows from the right table. Unmatched right table rows appear as NULL.
SELECT id, name, salary, dept_name FROM employee LEFT JOIN department ON employee.dept_id = department.dept_id;
Includes all employees, even those without a department.
1.
Right Outer Join: Returns all rows from the right table and matched rows from the left table. Unmatched left table rows appear as NULL.
SELECT id, name, dept_name FROM employee RIGHT JOIN department ON employee.dept_id = department.dept_id;
Includes all departments, even those without employees.
1.
Full Outer Join: Combines all rows from both tables, filling NULL for unmatched sides.
-- Simulated in MySQL using UNION: SELECT * FROM employee LEFT JOIN department ON ... UNION SELECT * FROM employee RIGHT JOIN department ON ...;
Supported natively in PostgreSQL as:
SELECT * FROM employee FULL JOIN department ON ...;
Key Difference:
Left Join = All left table rows + matched right rows.
Right Join = All right table rows + matched left rows.
Full Join = All rows from both tables.
Answer 4: Use of SQL in Procedural Programming Language
SQL can be integrated into procedural programming languages (e.g., Python, Java, C#) using APIs like JDBC, ODBC, or libraries such as SQLAlchemy.
1.
Embedded SQL: SQL statements are written directly within procedural code.
import sqlite3 conn = sqlite3.connect("company.db") cursor = conn.cursor() cursor.execute("SELECT * FROM Emp WHERE Salary > 5000") results = cursor.fetchall()
Here, Python executes SQL queries to fetch data.
1.
Dynamic SQL: Queries are constructed at runtime.
String sql = "INSERT INTO Emp VALUES (" + id + ", '" + name + "')"; statement.executeUpdate(sql);
1.
Stored Procedures: SQL logic is stored in the database and called from programs.
CREATE PROCEDURE GetEmployees() AS BEGIN SELECT * FROM Emp; END;
Called via cursor.callproc("GetEmployees") in Python.
Purpose: Combines SQL’s data manipulation power with procedural logic (loops, conditions).
Answer 5: Triggers and Their Uses (4 Marks)
Trigger (Page 100-102): An automated SQL code block that executes in response to specific events (INSERT/UPDATE/DELETE) on a table.
Syntax (Page 101):
CREATE TRIGGER backup_trigger AFTER DELETE ON Main FOR EACH ROW INSERT INTO Backup VALUES (OLD.id, OLD.salary);
Uses:
1.
Data Integrity: Ensures validations (e.g., preventing negative salaries).
2.
Audit Logs: Tracks changes (e.g., logging deleted records).
3.
Automation: Syncs tables (e.g., backup deleted data).
Disadvantages (Page 102):
Overheads due to automatic execution.
Hard to debug.
Answer 6: Role of SQL in Authentication & Authorization
(General Knowledge – Not in Document)
1.
Authentication:
SQL verifies user identities via credentials.
CREATE LOGIN User1 WITH PASSWORD = 'Secure123';
1.
Authorization:
SQL controls access using GRANT/REVOKE.
GRANT SELECT ON Emp TO User1; -- Allows read access REVOKE DELETE ON Emp FROM User1; -- Blocks deletes CREATE ROLE Manager; GRANT INSERT, UPDATE TO Manager;
1.
Roles: Assign permissions to groups.
Purpose: Ensures only authorized users access/modify data.
Answer 7: SQL in Maintaining Integrity Constraints
SQL enforces data integrity via constraints (Page 3-5, 99):
1.
Domain Constraints: Validate data types.
CREATE TABLE Emp (Age INT CHECK (Age >= 18));
2.
Entity Integrity: Ensures uniqueness (PRIMARY KEY).
CREATE TABLE Emp (Emp_ID INT PRIMARY KEY);
3.
Referential Integrity: Maintains relationships (FOREIGN KEY).
CREATE TABLE Emp ( DeptID INT, FOREIGN KEY (DeptID) REFERENCES Dept(Dept_ID) );
4.
Key Constraints: Uniqueness (UNIQUE).
CREATE TABLE Emp (Email VARCHAR(100) UNIQUE);
Purpose: Prevents invalid data entry and preserves relational accuracy.

Module 5
Q1. What is Insertion, Updation and Deletion Anomaly?
Answer:
Insertion, updation, and deletion anomalies arise due to improper database design, particularly when there is redundancy and lack of normalization in a relation.
Insertion Anomaly:
Occurs when certain data cannot be inserted into a table without the presence of other data. For example, if instructor and department data are stored in the same relation ( inst_dept), we cannot insert a new department unless we also insert an instructor.
Reference implied from Page 16.8
Updation Anomaly:
Happens when the same data is stored at multiple places, and updating it in one place but not in others leads to inconsistency. For instance, changing the budget of a department in one tuple and forgetting in others leads to inconsistency.
Reference implied from Page 16.8
Deletion Anomaly:
Occurs when deletion of data about one entity results in unintentional loss of data about another. For example, deleting the only instructor in a department may remove all data about that department if both are stored in one table.
Reference implied from Page 16.8-16.9
These anomalies indicate the need for decomposition and normalization to achieve a better relational design.
Note: These terms are not explicitly named in the document, but the concept is well illustrated through examples and decomposition problems.
Q2. define First/ Second/ Third Normal Form
Answer:
First Normal Form (1NF):
A relation is in 1NF if all its attribute domains are atomic — meaning each field contains only indivisible values. This prevents multi-valued or composite attributes. For example, a customer should not have a set of phone numbers in one attribute.
Reference: Page 16.12-16.14
Second Normal Form (2NF):
A relation is in 2NF if it is in 1NF and all non-prime attributes are fully functionally dependent on the entire candidate key. It removes partial dependencies — i.e., no non-prime attribute should depend on a part (subset) of a composite key.
Not found in document; based on standard DBMS definition.
Third Normal Form (3NF):
A relation is in 3NF if for every functional dependency α → β, one of the following holds:
1.
It is a trivial dependency (β ⊆ α),
2.
α is a superkey, or
3.
Every attribute in β – α is a prime attribute (i.e., part of some candidate key).
3NF relaxes BCNF slightly to preserve dependencies.
Reference: Page 17.9
Q3. List two desirable properties of decompositions.
Answer:
Two key properties of a good relational schema decomposition are:
1.
Lossless-Join Property:
This ensures that the original relation can be perfectly reconstructed by joining the decomposed relations. It avoids loss of information and maintains data integrity.
Example: Decomposing a relation R(A, B, C) into R1(A, B) and R2(B, C) is lossless if B is a key for either R1 or R2.
Reference: Page 16.10, also concept supported on Page 17.7
2.
Dependency Preservation:
A decomposition is dependency-preserving if all functional dependencies in the original relation can be enforced without needing to recombine relations. This helps in efficient constraint checking on individual tables.
Reference: Page 17.8
Both properties are essential to ensure the decomposition is practical and does not lead to anomalies or inefficiencies.
Sure! Here's the next set of answers, written in an exam-friendly format, each detailed enough for 4 marks, with page references when found in the document, and clear notes for generated content.
4) What is the major aim of denormalization?
Answer:
The major aim of denormalization is to improve the performance of a database by reducing the number of joins required during data retrieval. In highly normalized databases, data is spread across many tables. While this design minimizes redundancy and anomalies, it can lead to complex queries and slower performance due to multiple joins.
Denormalization introduces some controlled redundancy by combining tables or adding derived attributes to reduce query time. It is especially useful in read-heavy applications such as data warehousing or reporting systems.
However, it comes at the cost of increased storage and the potential for data anomalies during updates.
Not found in attached document; based on standard DBMS knowledge.
5) What is Multivalued Dependency?
Answer:
A Multivalued Dependency (MVD) occurs in a relation when one attribute in a table uniquely determines another set of attributes, independently of other attributes.
Formally, in a relation R, an MVD X ↠ Y holds if, for each value of X, there exists a set of values for Y that is independent of the values in the rest of the attributes (Z = R - X - Y).
Example:
If a person can have multiple phone numbers and multiple email addresses, but these are independent of each other, then:
Person →→ Phone and Person →→ Email are multivalued dependencies.
MVDs are used to detect and resolve redundancy beyond functional dependencies and lead to Fourth Normal Form (4NF) decomposition.
Not found in attached document; based on standard DBMS knowledge.
6) Explain the concept of Functional Dependency.
Answer:
A Functional Dependency (FD) is a constraint between two sets of attributes in a relation. Given a relation R, a functional dependency α → β means that if two tuples agree on attributes α, they must also agree on attributes β.
In other words, the value of α uniquely determines the value of β.
For example, in a student relation:
Roll_No → Name, Department
This means that a particular Roll_No determines a unique combination of Name and Department.
Functional dependencies are fundamental for:
Identifying keys
Normalizing schemas
Avoiding redundancy and anomalies
They form the theoretical basis for normal forms and relational schema design.
Reference: Page 16.20–16.22 (Definition and Examples)
7) What is closure and minimal cover?
Answer:
Closure:
The closure of a set of functional dependencies (F), denoted as F⁺, is the set of all functional dependencies that can be logically inferred from F using inference rules such as Armstrong’s Axioms (reflexivity, augmentation, transitivity).
Similarly, attribute closure (α⁺) refers to all attributes functionally determined by a set of attributes α.
Reference: Page 17.16–17.18
Minimal Cover (Canonical Cover):
A minimal cover is a simplified version of a set of functional dependencies that is:
1.
Logically equivalent to the original set,
2.
Has no redundant dependencies,
3.
And has no extraneous attributes in any dependency.
It is used to optimize and simplify the set of dependencies while ensuring that no necessary constraints are lost.
Reference: Page 17.13–17.14
Certainly! Here are detailed, exam-ready answers for questions 8 to 10, each tailored for a 4-mark response. I've included references to the attached PDF where applicable and supplemented with information from reliable external sources when necessary.
8) Give an example to illustrate the need for normalization.
Answer:
Consider the following unnormalized table:
StudentID
Name
Courses
Institute
101
Naveen Kumar
DBMS, OS
IIT Delhi
102
Utkarsh Tiwari
CN, COA
IIT Bombay
102
Utkarsh Tiwari
OS
IIT Kanpur
Issues Identified:
Redundancy: The student's name and ID are repeated for each course enrolled, leading to unnecessary duplication.
Update Anomaly: If Utkarsh Tiwari's name changes, it must be updated in multiple rows; failing to do so leads to inconsistency.
Insertion Anomaly: Adding a new course without a student is not possible unless we allow nulls or dummy data.
Deletion Anomaly: Removing the last course of a student could inadvertently remove all their information.
Normalization Process:
To address these issues, we apply normalization:
1.
First Normal Form (1NF): Ensure atomicity by separating multiple courses into individual rows.
StudentID
Name
Course
Institute
101
Naveen Kumar
DBMS
IIT Delhi
101
Naveen Kumar
OS
IIT Delhi
102
Utkarsh Tiwari
CN
IIT Bombay
102
Utkarsh Tiwari
COA
IIT Bombay
102
Utkarsh Tiwari
OS
IIT Kanpur
2.
Second Normal Form (2NF): Remove partial dependencies by creating separate tables:
Students Table:
StudentID
Name
101
Naveen Kumar
102
Utkarsh Tiwari
Courses Table:
CourseID
CourseName
C1
DBMS
C2
OS
C3
CN
C4
COA
Enrollments Table:
StudentID
CourseID
Institute
101
C1
IIT Delhi
101
C2
IIT Delhi
102
C3
IIT Bombay
102
C4
IIT Bombay
102
C2
IIT Kanpur
Benefits:
Eliminates Redundancy: Student information is stored once.
Prevents Anomalies: Updates, insertions, and deletions are handled without inconsistencies.
Improves Data Integrity: Ensures accurate and consistent data across the database.
9) What is attribute preservation and dependency preservation?
Answer:
In the context of database normalization and decomposition:
Attribute Preservation:
Ensures that all original attributes of a relation are present in the decomposed relations.
No attribute from the original schema is lost during decomposition.
Essential for reconstructing the original relation from its decomposed parts.
Dependency Preservation:
Ensures that all functional dependencies from the original relation are still enforceable in the decomposed relations without the need to recombine them.
Important for maintaining data integrity and enforcing constraints efficiently.
Illustration:
Consider a relation R(A, B, C) with functional dependencies:
A → B
B → C
Decomposing R into:
R1(A, B)
R2(B, C)
Attribute Preservation: All attributes A, B, and C are present in R1 and R2.
Dependency Preservation:
A → B is preserved in R1.
B → C is preserved in R2.
Therefore, the decomposition preserves dependencies.
10) What is the lack of redundancy in decomposition?
Answer:
Lack of redundancy in decomposition refers to the elimination of unnecessary duplication of data when a relation is decomposed into smaller relations. The goal is to:
Minimize Data Duplication: Ensure that each piece of information is stored only once.
Prevent Anomalies: Reduce the risk of update, insertion, and deletion anomalies.
Enhance Data Integrity: Maintain consistency across the database.
Example:
Consider a relation R(StudentID, Course, Instructor) where:
A student can enroll in multiple courses.
Each course has a single instructor.
If this relation is not decomposed:
Instructor information is repeated for every student enrolled in the course.
By decomposing R into:
R1(Course, Instructor)
R2(StudentID, Course)
We achieve:
Reduced Redundancy: Instructor information is stored once per course.
Improved Integrity: Changes to instructor details need to be made in only one place.
Module 6
Sure, I can expand each answer with more detail:
1) Explain the ACID properties of a transaction.
Answer:
The ACID properties are fundamental principles that ensure reliable transaction processing in database systems. They are as follows:
Atomicity: This principle dictates that a transaction is an indivisible unit, meaning it either completes entirely or not at all. If a transaction encounters any issue during execution, all its changes are rolled back to the state before the transaction started. This guarantees that partial updates to the database will not be visible to others, preventing inconsistent or corrupted states.
Example: Imagine a bank transfer where $100 is moved from Account A to Account B. If the transaction fails after deducting $100 from Account A but before crediting Account B, the entire transaction will be rolled back to maintain consistency.
Consistency: A transaction takes the database from one valid state to another valid state, ensuring that all integrity constraints (like foreign keys, unique constraints, etc.) are satisfied. The database must always comply with predefined rules, even after the transaction is completed.
Example: In a scenario where money is transferred between accounts, the consistency property ensures that the total balance of all accounts in the database remains consistent and adheres to the rules of accounting.
Isolation: This property ensures that the operations of one transaction are isolated from others. Even though transactions may run concurrently, each transaction should execute as though it is the only one running on the system. This prevents anomalies such as dirty reads (reading uncommitted data from another transaction).
Example: If two users simultaneously update their bank account balances, isolation ensures that they do not interfere with each other’s transactions. The transactions behave as if they are executed serially, one after the other.
Durability: Once a transaction has been committed, its changes are permanent and survive any system failures, like crashes. The system ensures that committed transactions are stored in non-volatile memory, so even if the database crashes, the data is preserved.
Example: After a user successfully transfers money from one account to another and the transaction commits, the transaction’s effect (the updated balances) will still be reflected even if the system crashes immediately afterward.
Additional Insight from Internet Sources:
Modern databases, including those used in NoSQL systems like MongoDB, extend ACID compliance to handle scenarios such as multi-document transactions. These databases support atomicity at the document level and ensure consistency through mechanisms like snapshot isolation. However, distributed systems often struggle with full ACID compliance due to challenges like network latency, failure recovery, and partitioning. To address these, systems may rely on protocols like the two-phase commit protocol, which is designed to ensure consistency in distributed transactions.
2) What concurrency anomalies (lost update, inconsistent retrievals, uncommitted dependency) are possible if there is no concurrency control?
Answer:
Without concurrency control, various anomalies can arise that disrupt the consistency and correctness of the database. The most common anomalies include:
1.
Lost Update: This occurs when two transactions read the same data and then modify it independently, but one transaction’s update is lost because the second transaction overwrites it. This happens when both transactions are unaware of each other’s changes, leading to an inconsistency.
Example: Transaction T1 reads a product price, applies a discount, and updates the price. Meanwhile, Transaction T2 also reads the same product price, applies a different discount, and updates the price. If T2’s update is applied last, T1’s changes are lost.
2.
Inconsistent Retrievals: This happens when a transaction reads data that is in the process of being modified by another transaction, leading to inconsistent or intermediate results. The transaction may see data in a partially updated state, which is not a valid or committed version.
Example: Transaction T1 is updating an employee’s salary, but Transaction T2 reads the employee’s salary while the update is not yet complete. T2 might retrieve an inconsistent salary value.
3.
Uncommitted Dependency (Dirty Read): In this case, one transaction reads data that is written by another transaction that has not yet been committed. If the first transaction aborts, the second transaction ends up with data that is not valid, which can lead to inconsistent results.
Example: Transaction T1 updates an account balance but hasn’t yet committed. Transaction T2 reads this uncommitted balance. If T1 aborts, T2 ends up working with invalid data.
Additional Insight from Internet Sources:
Other concurrency anomalies include:
Non-repeatable Read: A transaction reads the same data twice but sees different values because another transaction has modified it in between. This breaks the assumption that the data will remain consistent throughout the transaction.
Phantom Read: This occurs when a transaction queries a range of data, and another transaction inserts, deletes, or modifies records within that range during the execution of the first transaction. As a result, the first transaction sees different sets of data before and after the query.
Write Skew: Transactions that execute concurrently can lead to situations where conflicting updates are made based on stale or inconsistent data. For example, two doctors scheduling their on-call times, unaware that they are creating a conflict by both marking themselves as unavailable at the same time.
Concurrency control techniques such as locking, timestamp ordering, and optimistic concurrency control are designed to prevent these anomalies. Methods like two-phase locking (2PL) and multi-version concurrency control (MVCC) help serialize access to shared data, ensuring consistency while allowing transactions to run concurrently.
3) What is a recoverable schedule?
Answer:
A recoverable schedule is one in which transactions are scheduled in such a way that no transaction commits before all the transactions whose data it has read have committed. This ensures that, in the event of a failure, no transaction’s commit will be invalidated by the rollback of another transaction on which it depends. The key property of a recoverable schedule is that it prevents cascading rollbacks, which could otherwise propagate the effects of a failure across multiple transactions.
Key Condition: If transaction Tj reads data written by Ti, then Ti must commit before Tj can commit. This guarantees that if Ti aborts, Tj will also be able to detect the failure and abort itself, maintaining the integrity of the database.
Example:
Non-recoverable: Suppose T1 writes data A, and T2 reads A and then commits. If T1 later aborts, the commit of T2 becomes invalid, leading to inconsistencies.
Recoverable: In a recoverable schedule, T2 would not be allowed to commit until T1 commits, ensuring that if T1 aborts, T2 will also roll back, avoiding invalid data.
Additional Insight from Internet Sources:
A schedule can be serializable but still not recoverable. For example, in a non-recoverable schedule, T2 might commit before T1, even though T2 reads data written by T1. A cascadeless schedule, which is a stricter form of recoverability, prevents transactions from reading uncommitted data altogether. Cascadeless schedules are often enforced in practical systems to avoid the need for cascading rollbacks in case of failures.
4) What is final state serializability?
Answer:
Final state serializability is a weaker form of serializability where the database’s final state after executing a schedule must match the final state of some serial execution, regardless of the intermediate states. This property does not require the schedule to be serializable at every intermediate step, only that the end result of the transactions is the same as if the transactions were executed serially.
Contrast with Conflict Serializability: Conflict serializability requires that the schedule’s conflict graph be acyclic, which ensures that there are no cycles in the read-write conflicts between transactions. This guarantees that the execution can be reordered into a serial form that produces consistent intermediate states. Final state serializability, on the other hand, only cares about the final result, which allows for more concurrency but is harder to verify and enforce.
Example: Consider two transactions that make updates to different records, which are not read or modified by the other transaction. Even though these transactions may conflict in the order of execution, their final outcome is the same as if they had been executed serially.
Additional Insight from Internet Sources:
While conflict serializability is easier to implement and verify through methods like two-phase locking, final-state serializability allows for more flexibility and concurrency in schedules. However, it is computationally expensive to verify, as it requires checking all possible serial execution orders. This makes it less commonly used in practice, especially in large-scale databases where performance is critical.
5) What is serializability?
Answer:
Serializability is the property that ensures the concurrent execution of transactions yields the same results as if the transactions were executed serially in some order, one after the other, without any interleaving. Serializability guarantees that the database remains in a consistent state, avoiding issues such as lost updates, dirty reads, and inconsistent retrievals.
Serializable Schedule: A schedule is serializable if there exists some serial execution (an execution where transactions are executed one after the other without interleaving) that produces the same outcome as the concurrent execution. The key goal is to prevent anomalies caused by the interleaving of transactions.
Conflict Serializability: This is the most common form of serializability. A schedule is conflict-serializable if the order of transactions can be rearranged without violating transaction dependencies (i.e., the order of conflicting operations doesn’t change).
View Serializability: This form is less restrictive and focuses
6) What is VSR?
VSR, or View Serializability (from internet source), is a form of serializability that defines equivalence between schedules based on their views of the data. Two schedules are view equivalent if they meet specific conditions related to how transactions read the initial values of data items, read values written by other transactions, and perform the final writes on data items.
The conditions for View Serializability are as follows (from internet source):
1.
Both schedules must perform the same set of actions.
2.
If transaction Ti reads the initial value of a data item in one schedule, it must also do so in the other schedule.
3.
If transaction Ti reads a value written by transaction Tj in one schedule, it must also do so in the other schedule.
4.
If transaction Ti performs the final write on a data item in one schedule, it must also do so in the other schedule.
View Serializability is less restrictive than Conflict Serializability. This means that some schedules that are view serializable are not conflict serializable. Determining whether a schedule is view serializable can be complex.
7) What is CSR?
CSR, or Conflict Serializability (from internet source), is a type of serializability that focuses on conflicting operations between transactions. Two operations are considered to be in conflict if they belong to different transactions, operate on the same data item, and at least one of them is a write operation. A schedule is conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations.
In simpler terms, if a schedule can be reordered through a series of swaps of non-conflicting operations to match a serial schedule, then it is conflict serializable. Conflict Serializability ensures that the concurrent execution of transactions does not lead to any inconsistencies caused by conflicting operations. Conflict Serializability is stricter than View Serializability.
8) What does 2PL do?
2PL, or Two-Phase Locking Protocol (Page 2, Lesson 11), is a concurrency control protocol designed to ensure serializability in database transactions. It operates by dividing a transaction's execution into two distinct phases: a growing phase and a shrinking phase.
1.
Growing Phase: During this phase, a transaction can acquire locks on data items but cannot release any locks. The transaction requests and obtains all the locks it needs before proceeding.
2.
Shrinking Phase: In this phase, a transaction can release locks but cannot acquire any new locks. Once a transaction releases a lock, it cannot obtain any additional locks.
By adhering to these two phases, 2PL ensures that the resulting schedule is conflict serializable, thereby preventing concurrency anomalies and maintaining data consistency. The strict enforcement of lock acquisition and release helps avoid issues such as lost updates, dirty reads, and inconsistent retrievals. (from internet source) There are variations of 2PL, such as strict 2PL, which releases locks only after the transaction has either committed or aborted, providing an even higher level of data integrity.
9) What are locks?
Locks are mechanisms used in database management systems to control concurrent access to data items (Page 2, Lesson 11).
Purpose: To prevent multiple transactions from interfering with each other when accessing the same data.
Functionality: When a transaction needs to access a data item, it requests a lock on that item. If the lock is granted, the transaction can proceed with its operation. The lock prevents other transactions from accessing the same data item in a way that could lead to inconsistencies (from internet source).
Types of Locks:
Shared Locks (Read Locks): Allow multiple transactions to read the same data item concurrently. No transaction can modify the data while a shared lock is held.
Exclusive Locks (Write Locks): Allow only one transaction to write to a data item. No other transaction can read or write the data item while an exclusive lock is held.
Compatibility: Locks have compatibility rules that determine whether a lock can be granted to a transaction based on the locks already held on the data item (from internet source). For example, multiple shared locks can be held on a data item, but an exclusive lock cannot be held concurrently with any other lock.
10) Give an example of a precedence graph with a cycle.
A precedence graph (not explicitly found in the attached document, but related to serializability on Page 2, Lesson 11) is a directed graph used to determine the serializability of a schedule.
Nodes: Each node in the graph represents a transaction.
Edges: A directed edge from transaction Ti to transaction Tj indicates that Ti must occur before Tj in any equivalent serial schedule. This edge is created when there is a conflict between an operation in Ti and an operation in Tj on the same data item, and at least one of the operations is a write.
Cycle: A cycle in the precedence graph indicates that the schedule is not conflict serializable (from internet source). This means there is no equivalent serial order of transactions that produces the same result.
Example:
Consider the following schedule with transactions T1, T2, and T3:
1.
T1: Write(A)
2.
T2: Write(B)
3.
T3: Write(C)
4.
T2: Read(A)
5.
T3: Read(B)
6.
T1: Read(C)
T1 writes A, and T2 reads A, so there is an edge from T1 to T2.
T2 writes B, and T3 reads B, so there is an edge from T2 to T3.
T3 writes C, and T1 reads C, so there is an edge from T3 to T1.
This creates a cycle: T1 -> T2 -> T3 -> T1, indicating the schedule is not conflict serializable.
11) What is the main feature of timestamping concurrency control?
The main feature of timestamping concurrency control (not explicitly found in the attached document, but related to concurrency control on Page 2, Lesson 11) is the assignment and use of unique timestamps to order transaction execution (from internet source).
Timestamp Assignment: Each transaction is assigned a unique timestamp when it starts. This timestamp is typically based on the system clock or a logical counter.
Ordering of Transactions: Transactions are executed in the order of their timestamps. The system ensures that if transaction Ti has an earlier timestamp than transaction Tj, then Ti appears to execute before Tj.
Conflict Resolution: When conflicting operations occur, the system uses the timestamps to determine whether to allow the operation or reject it. For example, if a transaction tries to read a data item that has been written by a later transaction, the read operation may be rejected and the transaction rolled back.
Lock-Free: Timestamping concurrency control avoids the use of locks, which can reduce the overhead associated with lock management and prevent deadlocks.
12) Give one example of a deadlock prevention
One example of deadlock prevention (Page 2, Lesson 11) is to impose a total ordering of all lockable data items and require that a transaction can only request locks in increasing order (from internet source).
Total Ordering: Assign a unique number or identifier to each data item in the database.
Lock Request Order: Require that a transaction can only request locks on data items in increasing order based on their assigned identifiers.
Circular Wait Prevention: By enforcing this ordering, it becomes impossible for transactions to form a circular wait, which is a necessary condition for a deadlock.
Example:
If data item A has ID 1 and data item B has ID 2, a transaction that needs to lock both A and B must first lock A before attempting to lock B.
This prevents a scenario where transaction T1 holds a lock on A and is waiting for B, while transaction T2 holds a lock on B and is waiting for A, thus avoiding a deadlock.
PYQ
SELECT DISTINCT Age FROM patients ORDER BY Age ASC;
ii) Show first and last names of patients who do not have any allergies.
SQL
SELECT FName, LName FROM patients WHERE allergies IS NULL;
iii) Update the patient's table for the allergies column. If the patient's allergies are null then replace it with "NKA".
SQL
UPDATE patients SET allergies = 'NKA' WHERE allergies IS NULL;
iv) Show patient_id and FName from patients whose name starts and ends with 'S' and FName is 6 characters long.
SQL
SELECT patient_id, FName FROM patients WHERE FName LIKE 'S%S' AND LENGTH(FName) = 6;
v) Show the city and the total number of patients in the city. Order from most to least patients and then by city name ascending.
SQL
SELECT City, COUNT(patient_id) AS TotalPatients FROM patients GROUP BY City ORDER BY TotalPatients DESC, City ASC;