The Business Intelligence and
Data Warehousing Glossary
Terminology Tours - View Terms in Logical Order - It's a great introduction to the world of Business Intelligence and Data Warehousing! Just Click NEXT ON TOUR After Reading About Each Word
We have created a utility that extends the power of Microsoft SQL Server's OLAP solution by automating the repeated creation of multiple local cube files. More information is available at:
Agent
An application that searches the data and sends an alert when a particular pattern is found.
Aggregations
Information stored in a data warehouse in a summarized form.
Instead of recording the date and time each time a certain product is sold, the data warehouse could store the quantity of the product sold each hour, each day, or each week.
Aggregations are used for two primary reasons:
| |
To save storage space. Data warehouses can get large. The use of aggregations greatly reduces the space needed to store data.
To improve the performance of business intelligence tools. When queries run faster they take up less processing time and the users get their information back more quickly. |
Some data warehouses store both the detailed information and aggregated information. This takes even more space, but gives users the possibility of looking at all the details while still having good query performance when looking at summaries.
Some systems use aggregations for historical data. Perhaps detailed data is kept on-line for a year. After that the detailed data is kept in a less accessible, permanent storage format, and only the aggregated, summary data is kept on-line.
Aggregations are often created as the sum of the individual records. You can also have aggregations for count, distinct count, maximum value, and minimum value.
Alert
A message that is sent automatically by a computer system when a certain situation occurs.
One of the greatest benefits of data warehousing is the ability to set alerts.
| |
A store manager can be automatically informed when a certain product's sales fall below or rise above a specified range.
A factory manager can be automatically informed when the failure rate of a product exceeds a specified level.
A sales manager can be automatically informed when a member of his staff achieves a personal high level of sales for a time period. |
Alerts allow a company to receive critical business information in the quickest possible time.
NEXT ON TOUR - PREVIOUS ON TOUR
Analysis Services
Business Intelligence tools included with Microsoft SQL Server 2000.
The same product in Microsoft SQL Server 7.0 was called OLAP Services. The name was changed because the Microsoft SQL Server 2000 version included data mining capabilities as well as the OLAP capabilities.
Attribute
Additional information included with a dimension, that is not used in defining the levels of the dimension.
Dimensions become more useful when there are many descriptive attributes that can be used for analyzing the data. In Microsoft Analysis Services attributes are used to create member properties and those member properties can be used to create virtual dimensions

Business Intelligence Tools
Software that enables business users to see and use large amounts of complex data.
The following three types of tools are referred to as Business Intelligence Tools:
| |
1. Multidimensional Analysis Software - Also Known As OLAP (Online Analytical Processing) - Software that gives the user the opportunity to look at the data from a variety of different dimensions.
2. Query Tools - Software that allows the user to ask questions about patterns or details in the data.
3. Data Mining Tools - Software that automatically searches for significant patterns or correlations in the data. |
NEXT ON DATA WAREHOUSING TOUR - PREVIOUS ON DATA WAREHOUSING TOUR
NEXT ON BUSINESS INTELLIGENCE TOUR - PREVIOUS ON BUSINESS INTELLIGENCE TOUR

Calculated Member
In Microsoft Analysis Services, a member that is calculated from one or more other members using an MDX formula.
Calculated members are often measures, but they can also be members of levels from other dimensions. In an MDX query, a calculated member can be created in the WITH clause.
Cell
A cell is a single point in a cube.
Cubes have cells for all of the possible combinations of points from all of the cube's dimensions.
Cellset
A set of data returned from a cube with an MDX query.
A cellset in the multidimensional world is equivalent to a rowset in the relational database world.
Changed Data Capture
In a database replication, changed data capture occurs when only the data that has changed since the previous replication is copied.
Changing Dimensions
A dimension that has level or attribute data that needs to be updated.
Data changes in the fields for levels or attributes is one of the most challenging design issues for multidimensional (star schema) data modeling. This issue is often referred to as the handling of Slowly Changing Dimensions. There are three ways outlined by Ralph Kimball (and others) to handle this situation:
Type #1. Change the data in the dimension table.
Type #2. Add new records to the dimension table that contain the new data.
Type #3. Add new fields to the dimension table to contain the values before and after the change.
Various authors (including Ralph Kimball) also suggest the splitting of dimensions into separate dimensions when a consolidated dimension would result in a high level of dimension change.
Clickstream Data
Data regarding web browsing.
Web servers capture a large amount of data in the process of receiving requests for web pages. This data includes page served, time, source of the request, type of browser making request, etc. When analyzed, this data provides information about the behavior of individuals who are browsing the internet. It can help businesses analyze where visitors are coming from, what type of visitors are most likely to buy certain products, what type of web pages are most attractive, etc. This information is essential for analyzing the effectiveness of internet ad campaigns and, in general, for finding ways to improve the effectiveness of internet commerce.
Clickstream data typically requires a significant amount of transformation as it is loaded into a data warehouse. Once in the warehouse it can be used for standard reports, for OLAP, and for data mining.
Conformed Dimension
A dimension that is used in more than one cube.
The use of conformed dimensions and shared measures is the primary way a set of data marts can be united into one consolidated data warehouse.
Cube
|
Also Known As Multidimensional Cube |
The fundamental structure for data in a multidimensional (OLAP) system.
A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell.
NEXT ON TOUR - PREVIOUS ON TOUR

Data - SEE Data, Information, and Knowledge and Data-Based Knowledge
Data-Based Knowledge
Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing.
Most of our knowledge is based on a combination of our experience, perception, and intuition. Business Intelligence and Data Warehousing give us a new kind of knowledge based on data.
Data-based knowledge can have several advantages over experience/intuition-based knowledge:
| |
1. It can be more accurate because it is based on so many detailed facts.
2. It can be more current because the data warehousing and business intelligence tools can so quickly analyze new data.
3. It can be more comprehensive because so many different perspectives are available through the rapid recombination of elements from different dimensions and different levels of the data hierarchy.
4. It can give new insights because there are complex patterns in the data that can be discovered by data mining that would never be detected by human analysis.
5. It can be less subjective because conclusions are tied directly to the physical data. |
SEE ALSO - Data, Information, and Knowledge
NEXT ON TOUR
Data Cleansing
Removing errors and inconsistencies from data being imported into a data warehouse.
SEE ALSO Data Quality Assurance
Data, Information, and Knowledge
Data is the reality that a computer records, stores, and processes.
The use of computers can be referred to as data processing. At the lowest level data has no significance for people. This lowest level in the perception of reality is sometimes referred to as "raw data".
Information is what a person is able to understand about reality.
Information systems use computers to organize data in such a way that people can understand the results.
Knowledge is what a business uses to make decisions.
The process of organizing information in such a way as to create data-based knowledge is called Data Warehousing. The software products that present this knowledge to users are sometimes called Business Intelligence Tools.
The goal of business intelligence and data warehousing - changing data into information and knowledge.
Organizations are gathering and storing more and more data. Every year the amount of data in the world is approximately doubling. This data is of little benefit unless it can be turned into useful information and knowledge.
Information by itself is an inadequate basis for business decisions because the amount of information, like the amount of data, is overwhelming. Business Intelligence Tools are designed to find what is significant - what really adds to our useful knowledge - in the piles of data and information.
NEXT ON TOUR
Data Mart
| |
Also Known As: Local Data Warehouse or Datamart |
A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.
There are three different (and somewhat contradictory) views of the place of the data mart in the world of data warehousing.
| |
1. The data warehouse gathers all the information from the various legacy systems. Specialized data marts are then created with a subset of the information in the data warehouse. These data marts are easier to use because they only have the particular information the specific user group needs. The use of several data marts also allows the querying load to be spread among several different computers. This can reduce network traffic.
2. Free-standing data marts are created, independent from a data warehouse. The information for the data mart probably comes from just one legacy system. It is quicker and cheaper to build a separate data mart instead of building an enterprise-wide data warehouse with data marts derived from it. The drawback of this solution is that the company's data is not integrated (and thereby violates one of Bill Inmon's original defining characteristics of the data warehouse). If several separate data marts are built using this strategy, they will usually contain data that is duplicated and inconsistent.
3. The data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge. A data mart is built with that group's data. Additional types of information are added to the data mart as time goes on until it is turned into a data warehouse. |
New terminology is often created and developed for marketing purposes. The term 'data mart' probably has a marketing advantage over the term 'data warehouse'. The whole data warehousing process is about creating data-based knowledge and bringing that knowledge to people. A warehouse is a place where things are stored away. A mart is a convenient place to buy something. Most data warehousing professionals (including myself) include ready access to information as a defining characteristic of the term 'data warehouse'. I think, though, that the term 'data mart' captures this sense of data availability more effectively.
NEXT ON TOUR - PREVIOUS ON TOUR
Data Migration
The movement of data from one environment to another.
This happens when data is brought from a legacy system into a data warehouse.
Data Mining
The process of finding hidden patterns and relationships in the data.
Analyzing data involves the recognition of significant patterns. Human analysts can see patterns in small data sets. Specialized data mining tools are able to find patterns in large amounts of data. These tools are also able to analyze significant relationships that exist only when several dimensions are viewed at the same time.
Users can ask data questions using standard queries when they know what they're looking for. Queries can be written for questions like this: "Which of our out-of-town customers have given us the most business in the last year?"
Data mining is needed when the user's questions are more vague or general in nature. Data mining questions would include: "What attributes characterize the customers that gave us the most business in the past year?"
NEXT ON TOUR - PREVIOUS ON TOUR

Data Quality Assurance
| |
Also Known As: Data Cleansing or Data Scrubbing |
The process of checking the quality of the data being imported into the data warehouse.
Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage out".
Data quality can be a challenge for several reasons:
| |
The data is being consolidated from a variety of legacy sources that may have differing definitions of key concepts such as "customer" or "profit".
The legacy data was not originally collected for the purpose of decision support so some of the key data might be missing, incomplete, or not as accurate as desired.
There might be times when all the data is not received from one of the legacy systems. This could make comparisons between time periods invalid. |
A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed..
In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.
NEXT ON TOUR - PREVIOUS ON TOUR
Data Scrubbing
Removing errors and inconsistencies from data being imported into a data warehouse.
SEE ALSO Data Quality Assurance
Data Transformation
The modification of data as it is moved into the data warehouse.
This modification can include:
| |
Data Cleansing - Part of the Process of Data Quality Assurance
Dimensionalization - Organizing the data into the multidimensional (OLAP) structure of a star schema.
Normalization - Organizing the data into the normal structure of a relational database
Processing Calculations
Changing Data Types
Making the Data More Readable
Replacing Codes with Actual Values
Summarizing the Data by Various Time Periods - See Aggregations
Summarizing the Data in Other Ways - See Aggregations |
NEXT ON TOUR - PREVIOUS ON TOUR
Data Warehouse
| |
Also Known As: Datawarehouse or Information Warehouse |
A database where data is collected for the purpose of being analyzed.
The defining characteristic of a data warehouse is its purpose.
Most data is collected to handle a company's on-going business. This type of data can be called "operational data". The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).
A data warehouse collects, organizes, and makes data available for the purpose of analysis - to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing).
Bill Inmon coined the term "data warehouse" in 1990. His definition is:
"A (data) warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process."
| |
Subject-oriented - Data that gives information about a particular subject instead of about a company's on-going operations.
Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant - All data in the data warehouse is identified with a particular time period.
Non-volatile - Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. |
NEXT ON TOUR - PREVIOUS ON TOUR
Data Warehousing Management
The on-going supervision of the data warehousing process.
Data warehousing is an on-going process. All of the issues that need to be addressed when a data warehousing project is started also need to be addressed as the data warehouse is used and, most likely, expanded.
The types of data warehousing management issues that need to be addressed are:
| |
Deciding on Management - Who is sponsoring the project? Who is making the tough decisions? Who is going to mediate conflicts?
Deciding on Scope - Which business processes are going to be included? What granularity of data is going to be used?
Training - Of management personnel, technical personnel, and end users.
Staffing - Who is coordinating the project? Who is doing the technical work? Who is doing the training?
Budgeting - For hardware, software, personnel, training, consulting. |
NEXT ON TOUR - PREVIOUS ON TOUR

Data Warehousing
The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts.
Whether we're building a data warehouse, a data mart, or both, we are taking part in a complex, on-going process. The emphasis in the data-based knowledge business needs to be kept on the process. That's why you're reading a glossary of "data warehousing terminology" instead of a glossary of "data warehouse terminology".
There are many steps in the data warehousing process -
| |
Visioning - Having an idea about what could be accomplished.
Learning - Studying the potential of data warehousing.
Justifying - Developing a business purpose for the process.
Budgeting - Counting the cost.
Deciding - Making a commitment to develop and use data-based knowledge.
Gathering Information - Examining legacy systems.
Interviewing Users - Finding what information is needed.
Choosing Tools - Choosing the hardware, the database management system, the data extraction tools, and the Business Intelligence tools..
Building, Using, Testing, and Evaluating the Prototype - Repeat this step and the above steps as necessary.
Deploying - Putting the system into operation.
Training - Helping users make full use of the Business Intelligence tools.
Managing - Keeping track of scheduled data replication, system usage, and query performance.
Adding, Modifying, On-Going Development - As the system is used, new possibilities will be discovered. |
Consider also all the actions that take place as a part of the data warehousing process -
| |
Data Replication - Periodic copying of legacy data.
Data Transformation - Transforming the legacy data into the form in which it will be stored in the data warehouse.
Data Quality Assurance - Testing the data for inconsistencies and errors.
Data Storage - Storing the data in a DBMS (Database Management System).
Metadata Storage - Storing the description of the data - the data about the data.
Data Mart Population - Populating all the data marts that receive their data from the warehouse.
Setting Up Business Intelligence Tools - Giving users access to the data through multidimensional analysis, querying, and data mining.
Setting Alerts - Establishing conditions that result in an automatic message being sent.
Data Warehousing Management - Keeping track of how well all the other actions are being carried out. |
NEXT ON TOUR - PREVIOUS ON TOUR
Data Warehousing Information Center, The
A great source for data warehousing information. Contains links to information and companies in the data warehousing business.
JUMP TO SITE - The Data Warehousing Information Center
THIS IS THE END OF THE DATA WAREHOUSING TERMINOLOGY TOUR - PREVIOUS ON TOUR
Data Warehousing Institute, The
An organization for data warehousing professionals.
JUMP TO SITE - The Data Warehousing Institute
NEXT ON TOUR - PREVIOUS ON TOUR
Database Management System (DBMS)
The software that is used to store, access, and manage data.
There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multidimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS).
NEXT ON TOUR - PREVIOUS ON TOUR
Datamart - SEE Data Mart
Datawarehouse - SEE Data Warehouse
DBMS - SEE Database Management System
Decision Support System (DSS)
A computer system designed to assist an organization in making decisions.
The Decision Support Systems and Enterprise Information Systems of the 1980's and early 1990's were forerunners of today's Business Intelligence Tools.
Density or Dense - SEE Sparsity
Dimension
One of the perspectives that can be used to analyze the data in an OLAP cube.
When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.
For a Sales database, the dimensions could include Product, Time, Store, and Promotion.
Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the the cube. When a dimension has just one hierarchy (which is quite common ), people often refer to the dimension itself having levels.
NEXT ON TOUR - PREVIOUS ON TOUR
Dimension Table
In a star schema, a table which contains the data for one of the cube's dimensions.
The dimension table has a primary key which is used to connect it to the fact table.
The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension's levels.
The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension.
If there are multiple hierarchies in the dimension, there is one level field for each distinct level in each of the hierarchies. If the hierarchies share some levels in common, they are represented by a single field For Calendar and Fiscal hierarchies in a Time dimension, the level fields could be Fiscal Year, Calendar Year, Fiscal Quarter, Calendar Quarter, Month, and Day.
For the Product dimension table, some of the attribute fields could include Description, Product Number, Product Type, Department, Package Size, Weight, Shelf Length, etc.
The dimension tables in a star schema are intentionally de-normalized. The level fields and the attribute fields contain data that is duplicated in many of the records. This normally does not add a significant amount to the amount of storage space needed in the database, because the overall size of each dimension table is very small when compared to the size of the fact table.
Dimensionalization
The process of transforming data into a multidimensional (or star) schema.
Drill Down and Drill Up
The ability to move between levels of the hierarchy when viewing data with an OLAP browser.
Drill Down - Changing the view of the data to a greater level of detail.
Drill Up - Changing the view of the data to a higher level of aggregation.
Multidimensional analysis (OLAP) tools organize the data in two primary ways: in multiple dimensions and in hierarchies.
Drilling down and drilling up allow an analyst to move down and up the hierarchies to see how the information at the various levels is related. After looking at the sales totals for a store's departments, the analyst may want to drill down to see the individual sales for each employee in one of the departments. Then the analyst may choose to drill up to view how this store's total sales compare to other stores in the same region.
NEXT ON TOUR - PREVIOUS ON TOUR
DSS (See Decision Support System)
DTS (Data Transformation Services)
An ETL tool provided as a part of Microsoft SQL Server.
DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications.

Enterprise Information System/Executive Information System (EIS)
A computer system that presents a summary of a company's important data.
ETL (Extract, Transform, and Load)
ETL refers to the process of getting data out of one data store (Extract), modifiying it (Transform), and inserting it into a different data store (Load)

Fact table
In a star schema, the central table which contains the individual facts being stored in the database.
There are two types of fields in a fact table:
| |
1. The fields storing the foreign keys which connect each particular fact to the appropriate value in each dimension.
2. The fields storing the individual facts (or measures) - such as number, amount, or price. |
The granularity of the fact table is one of the most significant design decisions in creating a data warehouse. The facts should be as detailed as possible to allow for the data to be viewed from the greatest number of perspectives

Granularity
The level of detail of the facts stored in a data warehouse
Hierarchy
Organization of data into a logical tree structure.
Dimensions can have one or more hierarchies. A Time dimension, for example, could have a Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data into a logical structure.
It is the combination of a multidimensional with a hierarchical view in Business Intelligence Software that allows users to grasp large amounts of data. If each member in a level has 5 to 10 children that are members at the next lower level, the user has a better chance of understanding the significance of the data.
Moving between the levels of a hierarchy is called drilling up and drilling down.
NEXT ON TOUR - PREVIOUS ON TOUR
Hybrid OLAP (HOLAP)
A combined use of Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP).
In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored using a MOLAP strategy. This combination usually results in the least amount of storage space and the fastest cube processing.
Hyper-Cube
| |
Also Known As Cube and Multidimensional Cube |
A cube with more than three dimensions.
A cube is an object with three dimensions. A hyper-cube is a cube-like structure with more than three dimensions. In the world of OLAP, hyper-cubes are nearly always simply referred to as cubes.
NEXT ON TOUR - PREVIOUS ON TOUR

Information - SEE Data-Information-Knowledge Spectrum
International Data Warehousing Association, The
An organization for data warehousing professionals.
NEXT ON TOUR - PREVIOUS ON TOUR
Knowledge - SEE Data-Information-Knowledge Spectrum

Legacy System
A computer system that's been around for a while.
Sometimes organizations have several legacy systems that have been developed at different times by different people for a variety of purposes. The data in these systems is usually mutually incompatible and sometimes inaccurate. One of the biggest challenges of the data warehousing process is to bring data out of the variety of systems where it currently is located and organize it so it all fits together in the data warehouse.
Level
The hierarchies in dimensions have levels which can be used to view data at various levels of detail.
A Time dimension could have levels for Year, Quarter, Month, and Day.
A Product dimension could have levels for Product Family, Product Category, Product Subcategory, and Product Name.
A Customer Geography dimension could have levels for Region, Country, District, State, City, and Neighborhood.
NEXT ON TOUR - PREVIOUS ON TOUR
Local Cube
A cube contained in a file.
Microsoft Analysis Services (OLAP Services) provides the ability to take all or a subset of a server cube and create a local cube file. The local cube can be used to analyze OLAP data while the user is disconnected from the network.
END OF BUSINESS INTELLIGENCE TOUR - PREVIOUS ON TOUR
Local Data Warehouse - SEE Data Mart

Multidimensional Schema SEE Star Schema
MDD (Multidimensional Database) SEE Multidimensional Database Management System (MDBMS)
MDX (Multidimensional Expressions)
The querying language for OLAP cubes.
MDX has some similarities to SQL, but has many unique features. The following query returns a cellset with the names of the store regions on the columns, the names of product families on the rows, and the profit displayed in the cells:
select
[Stores].[Region].Members on columns,
[Products].{Product Family].Members on rows
from SalesCube
where ([Measures].[Profit])
Measure
A numeric value stored in a fact table and in an OLAP cube.
Sales Count, Sales Price, Cost, Discount, and Profit could all be measures in an OLAP cube.
NEXT ON TOUR - PREVIOUS ON TOUR
Member
One of the data points for a level of a hierarchy of a dimension.
Some of the members of the Month level of the Time dimension are January, February, March, and April.
NEXT ON TOUR - PREVIOUS ON TOUR
Member Property
An attribute of a level that is available for OLAP querying.
In Microsoft Analysis Services you can create member properties for any level. These member properties can be referenced directly in MDX queries and they can also be used for creating virtual dimensions.
Metadata
| |
Also Known As: Meta Data or Meta-data |
Data that describes the data in the warehouse.
Metadata includes the following:
| |
A description of tables and fields in the warehouse, including data types and the range of acceptable values.
A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse.
A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation.
Any other information that is needed to support and manage the operation of the data warehouse. |
There are a number of companies and organizations attempting to standardize the use of metadata. A standard metadata model would greatly aid the process of integrating data warehousing tools from different companies. Some data warehousing experts believe that the standardization of metadata is impossible.
NEXT ON TOUR - PREVIOUS ON TOUR
Metric - ANOTHER TERM USED FOR Dimension
Multidimensional Analysis
A process of analysis that involves organizing and summarizing data in a multiple number of dimensions.
People can comprehend a far greater amount of information if that information is organized into dimensions and into hierarchies. The wide use of spreadsheets and graphs illustrates the need for people to have their information organized.
A spreadsheet is a two-dimensional analysis tool. If a person could comprehend 10 individual facts, they could possibly comprehend 100 facts if they were arranged in a spreadsheet.
If 3 or 4 or 5 dimensions could be displayed, the amount of information that could be comprehended would be increased exponentially - to 1000 facts, 10,000 facts, and 100,000 facts.
Multidimensional data is also organized hierarchically, allowing users to "drill down" for more detailed information, "drill up" to see a broader, more summarized view, and "slice and dice" to dynamically change the combinations of dimensions that are being viewed.
NEXT ON TOUR - PREVIOUS ON TOUR
Multidimensional Cube - ANOTHER TERM USED FOR Cube
Multidimensional Database (MDD) - SEE DEFINITION FOR MDBMS
Multidimensional Database Management System (MDBMS)
A database management system that organizes data multidimensionally.
A multidimensional database management system organizes data specifically so it can be viewed with a multidimensional analysis (OLAP) tool. Because it is optimized for this purpose, it has the potential to deliver the information quickly and efficiently.
Multidimensional Online Analytical Programming (MOLAP)
OLAP that stores data and aggregations in a multidimensional database structures.

Non-Volatile
Data that does not change.
Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. Non-volatility is one of the original defining characteristics of a data warehouse.
Normalization
The process of organizing data in accordance with the rules of a relational database.
In a completely de-normalized database the customer name and address information would be stored every time a customer made a purchase.
In a normalized database each customer's name and address would be stored only once, in a separate table. Every purchase record would have a reference to the customer table to indicate which customer was involved.
Many individual decisions have to be made in the process of normalizing a de-normalized database. How do we know which customer information refers to the same person? When there is contradictory address information, how do we choose between the various alternatives?
A fully normalized database is usually the most efficient design for an On-Line Transaction Processing System. A data warehouse, with its emphasis on efficient retrieval of data, often benefits from some intentional de-normalization. See the discussion of the Star Schema.

OLAP (On-Line Analytical Processing)
The use of computers to analyze an organization's data.
"OLAP" is the most widely used term for multidimensional analysis software. The term "On-Line Analytical Processing" was developed to distinguish data warehousing activities from "On-Line Transaction Processing" - the use of computers to run the on-going operation of a business.
In its broadest usage the term "OLAP" is used as a synonym of "data warehousing". In a more narrow usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis.
"Think of an OLAP data structure as a Rubik's Cube of data that users can twist and twirl in different ways to work through what-if and what-happened scenarios." - Lee The, Editor, Datamation (May 1995)
NEXT ON TOUR - PREVIOUS ON TOUR
OLAP Browser
A tool used for multidimensional (OLAP) browsing.
OLAP Services
Business Intelligence tools included with Microsoft SQL Server 7.0.
OLAP Services was extended and renamed as Analysis Services in SQL Server 2000.
OLAP System
Term that is used as a synonym for datawarehousing system.
OLTP (OnLine Transaction Processing)
The use of computers to run the on-going operation of a business.

PivotTable Services
The tools for client access to Microsoft's Analysis Services (OLAP Services).
Private Dimension
In Microsoft Analysis Services, a dimension that is restricted in use to one particular cube.
Shared (conformed) dimensions are very useful in creating a unified data warehousing structure. You can create a dimension once and use it in several different cubes.
Private dimensions are useful in those situations where you want independent cubes. If you change a private dimension, that change only affects a single cube, whereas the change of a shared dimensions can have implications for many cubes.
Processing
In Microsoft Analysis Services, the building of cube and dimension structures and cube aggregations.
Cubes need to be processed before they can be browsed. For cubes, an incremental process only adds new data into the cube, while a full process reprocesses all the cube's data.

Relational Database Management System (RDBMS)
A Database Management System based on relational theory.
Most modern Database Management Systems (Oracle, Sybase, Microsoft SQL Server) are relational databases. These databases support a standard language - SQL (Structured Query Language).
Relational On-Line Analytical Processing (ROLAP)
OLAP that stores data and aggregations in a relational database.
Replication
The physical copying of data from one database to another.
In data warehousing replication takes place as data is moved from the on-line transaction processing system into the data warehouse. Replication also takes place if one or more data marts is being populated with data from the data warehouse.
There are several software tools that have been developed to handle replication into a data warehouse. These tools give the ability to transfer data out of and into a variety of database management systems. Many of these tools also provide data transformation and data cleansing capabilities.
Heterogeneous replication occurs when the source and the target database are not the same database management system.
Data migration is the movement of data from one environment to another - as happens when data is brought from a legacy system into a data warehouse.
Bi-directional replication is the ability to copy data in both directions between two databases.
In changed data capture only the data that has been changed since the last replication is copied.
In synchronization all the data stored in the database is replicated.
NEXT ON TOUR - PREVIOUS ON TOUR
ROLAP - SEE Relational On-Line Analytical Processing

Scale, Scalable, and Scalability
Having to do with the ability of a computer system or a database to operate efficiently with larger quantities of data.
Scalability is often discussed in situations when multiple processors are joined together. The system scales well (or is scalable) if doubling the number of processors also doubles the speed at which the system performs its tasks. The extra work involved in coordinating larger systems usually prevents them from being fully scalable - so that going from one to two processors would increase the total speed by less than a factor of two.
Schema
The logical organization of data in a database.
Shared Dimension
In Microsoft Analysis Services, a dimension used by more than one cube.
In general, a dimension that is used by more than one cube is called a conformed dimension.
Slice, Slicer, Slicing
The limiting of a cellset to data for a single member from a particular dimension.
Slicing in MDX is similar to filtering in a relational database. In an MDX query, the WHERE clause is called the slicer.
Slice and Dice
The ability to move between different combinations of dimensions when viewing data with an OLAP browser.
Multidimensional analysis tools organize the data in two primary ways: in multiple dimensions and in hierarchies.
Slicing and dicing refers to the ability to combine and re-combine the dimensions to see different slices of the information. Picture slicing a three-dimensional cube of information, in order to see what values are contained in the middle layer. Slicing and dicing a cube allows an end-user to do the same thing with multiple dimensions.
NEXT ON TOUR - PREVIOUS ON TOUR
Slowly Changing Dimensions (SCD)
A dimension that has levels or attributes that are changing on an occasional basis. SEE Changing Dimensions.
SQL (Structured Query Language)
The standard language for accessing relational databases.
Snowflaking
Normalization applied to the dimension tables of a star schema.
The star schema is a very simple database design, which clearly presents the multidimensional character of the data and allows for rapid querying of the data in a data warehouse. In snowflaking, some of the fields of the dimension tables are split off into separate tables. This achieves a higher level of normalization, but makes the database design more complex and can reduce the performance and ease of use for Business Intelligence Tools.
Sparsity and Density, Sparse and Dense
The degree to which the cells of a cube are filled with data.
One of the primary challenges of storing multidimensional data is the degree of sparsity that is often encountered. When many dimensions are considered with a fine grain of detail, most of the cells will be empty.
It is not uncommon for large cubes to have data in fewer than one in a million cells. Expressed numerically, that cube would have a density of less than .0001%.
NEXT ON TOUR - PREVIOUS ON TOUR

Star Schema (Business Definition)
A method of organizing information in a data warehouse that allows the business information to be viewed from many perspectives.
The star is a picture of the way the data is being stored. The basic factual information is in the middle of the star. The points of the star represent various perspectives from which the factual information can be viewed. In Figure 1, the Shipments can be grouped and viewed from the perspective of any of the five points of the star.

Star Schema (Technical Definition)
A database design that consists of a fact table and one or more dimension tables.
Each of the dimension tables has a single field primary key which has a one-to-many relationship with a foreign key in the fact table. The star schema is an intentional simplification of the database design that would be achieved by following the standard rules of normalization. The dimension tables are often flattened, to allow for more efficient querying (see 'snowflaking').
Structured Query Language (SQL)
The standard language for accessing relational databases.
Summary Tables
Tables used to store summarized or aggregated data.
Synchronization
The copying of all data in a database replication.

Time-variant data
Data that is identified with a particular time period.
Time-variant is one of the original defining characteristics of a data warehouse
Virtual Cube
The term used in Microsoft's Analysis Services (OLAP Services) for a cube that is created from portions of one or more base cubes.
A virtual cube is similar to a view in a relational database. It can be used for security purposes, giving users access to only some of the dimensions and measures. It can also be used to show data from separate cubes at the same time.
Virtual cubes are much more useful when you have shared dimensions and measures that are common to all the base cubes that are used.
Virtual Dimension
The term used in Microsoft's Analysis Services (OLAP Services) for a dimension that is created from one or more member properties in another dimension.
