by Aparna V. Kaliappan
A database backup is a copy of data, logs, and structural information that would be necessary to restore a database back to its state at a certain point in time. This is especially important after unexpected events such as power outages, hardware failures, or malware attacks that can lead to data corruption and data loss in a database.
Thus, database backup plans are a vital component of a company’s data storage, retention, and recovery procedures, in order to reduce response and recovery time in such situations. Database backups also often serve to meet legal or contractual data retention requirements.
Database backups are essential for software project failure expert witnesses to analyze and interpret data to reach objective determinations about claims relating to data stored in a database. Such data could include project requirements, software defects, revision histories, and testing results, among many others.
Given a database backup file, software experts can restore the database in a local version of the same Database Management System (DBMS) from which the backup file was created, such as Oracle or Microsoft SQL Server. Doing this will then allow experts to write Structured Query Language (SQL) queries to extract, analyze, and interpret the necessary data from the relevant database tables.
Experts can also analyze data present in data exports such as CSV files or spreadsheets. These exports can often be manually created by an expert from a restored database backup, but can also be received directly during document production for a matter.
This blog post discusses the most commonly used database backup methods and data export types.
Computer forensics experts can use database backups to analyze a large variety of data typically stored in a database during a software project. Such data can include project requirements, design information, quality assurance statistics, artifact revision histories, software defects, testing statistics, help-desk tickets, among many others.
After restoring a database backup, experts can write and execute SQL queries and review the resulting data to perform specific analyses. For instance, given artifact revision history data experts can identify particular changes in the data that occurred at specific points in time. Or for instance given software defect and testing data experts can analyze the data to compute metrics such as defect density, defects by severity, and defect removal efficiency.
There are many types of database backups, each differentiated by factors such as the amount of data backed up, the storage space of the backup, and the restoration time. The available database backup types can vary based on the particular DBMS used. Some of the most common types of database backups are discussed below.
- A full backup is a backup of the entire database at a particular point in time. A full backup will take more time and more storage space than when using other backup methods. This is because the entire data set needs to be backed up each time, leading to duplication between stored backups. In order to restore the entire database an expert would need access to just the latest full backup.
- An incremental backup is a backup of all changes in the database since the last incremental backup. An incremental backup will take a shorter amount of time and use less space than when performing a full backup, since only the changes are being backed up. However, in order to restore the entire database an expert would need access to the latest full backup and all incremental backups created since the last full backup, in order to first restore the full backup and then sequentially restore all incremental backups.
- A differential backup is a backup of all changes in the database since the last full backup. Differential backups will take more space than incremental backups due to duplication between stored backups. But, in order to restore the entire database an expert would need access to the latest full backup and just the latest differential backup, in order to first restore the full backup and then restore the differential backup.
For ease of restoration and completeness of data, a full backup would be the most useful for an expert looking to analyze and provide opinions dependent on data located in a database.
This section discusses common types of data exports that can be analyzed by experts.
Delimited Text Files
A delimited text file is a text file containing data where each row represents a single data record, each column of the row represents the value of an attribute of the data record, and each attribute value is separated by a delimiter. A delimiter in delimited text files is commonly a single character such as a comma or tab, but can also be other single characters such as a semicolon or space. Delimiters can also be a sequence of multiple characters such as ‘abc’, although this is less commonly used.
One of the most common forms of data exports is the Comma-Separated Values (CSV) file. A CSV file is a structured text file organized in tabular format where values are separated by commas.
Typically, the first row of the CSV file contains the data attributes/columns of the data, separated by commas. The remaining rows of the CSV file contain the values for these data attributes, also separated by commas. Another common type of data export similar to the CSV file is the Tab-Separated Values (TSV) file, where tab characters are used instead of commas to separate values.
Software failure expert witnesses can analyze the data provided in CSV or TSV files through different methods such as by importing these files into a database and writing SQL queries, importing these files into a spreadsheet and using formulas, writing scripts to parse and extract specific data, or performing text searches to locate data of interest.
Another common form of data export is a spreadsheet, which is typically created using tools such as Microsoft Excel, Google Sheets, or LibreOffice. While spreadsheet formulas and plugins can be used to gather some useful metrics from the data, this can often be a cumbersome process if there is a large amount of data to analyze.
In such cases, experts may need to import these spreadsheets into a database and write SQL queries to efficiently analyze the data. As spreadsheets often have formatted data that may not already be structured in a tabular format, experts may need to reorganize the data prior to analysis, taking into account factors such as hidden rows and columns, blank cells, and inconsistent data types of values.
The key is akin to a specific column attribute in a table. For instance, the key ‘color’ can have the value ‘red’ for a particular data row. Each data record/object is enclosed by curly braces, while the key-value pairs associated with each data object are separated by commas within those curly braces.
JSON files can be opened in any text editor, making it easy for experts to quickly determine the kinds of data being presented in the file. However, a JSON file may be difficult for an expert to analyze for some types of analyses.
For instance, in order to view the contents of a JSON file in tabular format, an expert would need to either write their own script that would process the data into such a desired format, or would need to resort to using open-source tools or plugins.
Additionally, in cases of nested data objects where some keys could have multiple possible key-value pairs as the value, it may be unwieldy to convert this data into an easily readable table.
Another form of data export is the Extensible Markup Language (XML) file. In XML files, data is presented in a hierarchical format in the form of key-value pairs, where the key is enclosed within angle brackets, known together as a “tag”, and the value is placed between the start and end tags. Alternatively, the key-value pairs can be specified directly within the angle brackets. Each data object is also specified within angle brackets, and the key-value pairs associated with each data object are nested within.
XML files can be opened in any text editor, allowing experts to quickly determine the kinds of data being presented in the file. However, an XML file may be difficult for an expert to analyze for some types of analyses.
Similar to difficulties with analyzing a JSON file, an expert would need to write their own script or use open-source tools and plugins to view the contents of an XML file in tabular format. Similarly, some effort may be necessary to convert large nested data objects into an easily readable table.
It is essential that organizations maintain regular backups of their databases and data for ease of data recovery in case of unexpected data loss, largely helping to mitigate risk. But, it is even more crucial in the case of legal proceedings, as having restorable database backups and/or data export files can greatly assist software project failure expert witnesses in reaching qualitative and quantitative opinions regarding software project failure or software dispute situations.