Home   Site Map   Careers   Blog   Contact   Media
About UsProductsServices & SolutionsCase StudiesGlobal Partnership ProgramNewsEvents
Download 1KEY EVAL
Business Intelligence Blog
Business Intelligence Group

The Business Intelligence & Data Warehousing Glossary


Terminology Tours - View Terms in Logical Order - It's a great introduction to the world of Business Intelligence and Data Warehousing! Just Click NEXT ON TOUR After Reading About Each Word.

START THE DATA WAREHOUSING TERMINOLOGY TOUR

START THE BUSINESS INTELLIGENCE TERMINOLOGY TOUR

A

Agent

An application that searches the data and sends an alert when a particular pattern is found.

Aggregations

Information stored in a data warehouse in a summarized form.

Instead of recording the date and time each time a certain product is sold, the data warehouse could store the quantity of the product sold each hour, each day or each week.

Aggregations are used for two primary reasons

  • To save storage space. Data warehouses can get large. The use of aggregations greatly reduces the space needed to store data.
  • To improve the performance of business intelligence tools. When queries run faster they take up less processing time and the users get their information back more quickly.
Some data warehouses store both the detailed information and aggregated information. This takes even more space, but gives users the possibility of looking at all the details while still having good query performance when looking at summaries.

Some systems use aggregations for historical data. Perhaps detailed data is kept on-line for a year. After that the detailed data is kept in a less accessible, permanent storage format, and only the aggregated, summary data is kept on-line.

Aggregations are often created as the sum of the individual records. You can also have aggregations for count, distinct count, maximum value and minimum value. Alerts is an notification from an event that has exceeded a pre-defined threshold.


Alert

A message that is sent automatically by a computer system when a certain situation occurs.

One of the greatest benefits of data warehousing is the ability to set alerts.

A store manager can be automatically informed when a certain product's sales fall below or rise above a specified range.

A factory manager can be automatically informed when the failure rate of a product exceeds a specified level.

A sales manager can be automatically informed when a member of his staff achieves a personal high level of sales for a time period.

Alerts allow a company to receive critical business information in the quickest possible time.

Analysis Services

Business Intelligence tools included with Microsoft SQL Server 2000. 

The same product in Microsoft SQL Server 7.0 was called OLAP Services. The name was changed because the Microsoft SQL Server 2000 version included data mining capabilities as well as the OLAP capabilities.


Attribute

Additional information included with a dimension, that is not used in defining the levels of the dimension.

Dimensions become more useful when there are many descriptive attributes that can be used for analyzing the data. In Microsoft Analysis Services attributes are used to create member properties and those member properties can be used to create virtual dimensions.

Access Path

The path chosen by a database management system to retrieve the requested data.

Ad-Hoc Query

Any query that cannot be determined prior to the moment the query is issued. A query that consists of dynamically constructed SQL, which is usually constructed by desktop-resident query tools.

The ability to create an immediate ‘request for information’ using an analytical reporting tool for a special purpose or to answer a specific business question.

Ad-Hoc Query Tool

An end-user tool that accepts an English-like or point-and-click request for data and constructs an ad-hoc query to retrieve the desired result.

Administrative Data

In a data warehouse, the data that helps a warehouse administrator manage the warehouse. Examples of administrative data are user profiles and order history data.

Aggregate Data

Data that is the result of applying a process to combine data elements. Data that is taken collectively or in summary form.

Analytics

Science of Analysis to know how an entity (i.e., business) arrives at an optimal or realistic decision based on existing data.

Atomic Data

Data elements that represent the lowest level of detail. For example, in a daily sales report, the individual items sold would be atomic data, while rollups such as invoice and summary totals from invoices are aggregate data.

Authorization Request

A request initiated by a consumer to access data for which the consumer does not presently have access privileges.

Authorization Rules

Criteria used to determine whether or not an individual, group, or application may access reference data or a process.



B

Business Intelligence Tools

Software that enables business users to see and use large amounts of complex data.

The following three types of tools are referred to as Business Intelligence Tools :

  • Multidimensional Analysis Software - Also Known As OLAP (Online Analytical Processing) - Software that gives the user the opportunity to look at the data from a variety of different dimensions.
  • Query Tools - Software that allows the user to ask questions about patterns or details in the data.
  • Data Mining Tools - Software that automatically searches for significant patterns or correlations in the data.

Base Tables

The normalized data structures maintained in the target warehousing database. Also known as the detail data.

Bi-directional Extracts

The ability to extract, cleanse, and transfer data in two directions among different types of databases, including hierarchical, networked, and relational databases.

Braking Mechanism

A software mechanism that prevents users from querying the operational database once transaction loads reach a certain level.

Bulk Data Transfer

A software-based mechanism designed to move large data files. It supports compression, blocking and buffering to optimize transfer times.

Business Architectur

One of the four layers of an information systems architecture. A business architecture describes the functions a business performs and the information it uses.

Business Data

Information about people, places, things, business rules, and events, which is used to operate the business. It is not metadata. (Metadata defines and describes business data).

Business Drivers

The people, information, and tasks that support the fulfillment of a business objective.

Business Model

A view of the business at any given point in time. The view can be from a process, data, event or resource perspective, and can be the past, present or future state of the business.

Business Transaction

A unit of work acted upon by a data capture system to create, modify, or delete business data. Each transaction represents a single valued fact describing a single business event.

Business Measure

A business metric is any type of measurement used to gauge some quantifiable component of a company’s performance, such as return on investment, employee and student counts, revenues and expenses, etc.

Business Intelligence

Business Intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help clients make better business decisions. BI is the use of data to better inform decision makers within a company. It is better referred to as Business Knowledge as it is really about understanding the way your company, customers and competitors work to allow better decisions. This is achieved by combining internal data (sales, financial data) with external data (market data, public data) and using a tool to display and explore it. BI tool offers hybrid data connectivity, Trend analysis, data visualization and MIS Reporting.



C

Calculated Member

In Microsoft Analysis Services, a member that is calculated from one or more other members using an MDX formula.

Calculated members are often measures, but they can also be members of levels from other dimensions. In an MDX query, a calculated member can be created in the WITH clause.

Cell

A cell is a single point in a cube.

Cubes have cells for all of the possible combinations of points from all of the cube's dimensions.

Cellset

A set of data returned from a cube with an MDX query.

A cellset in the multidimensional world is equivalent to a rowset in the relational database world.

Changed Data Capture

In a database replication, changed data capture occurs when only the data that has changed since the previous replication is copied.

Changing Dimensions

A dimension that has level or attribute data that needs to be updated.

Data changes in the fields for levels or attributes is one of the most challenging design issues for multidimensional ( star schema) data modeling. This issue is often referred to as the handling of Slowly Changing Dimensions. There are three ways outlined by Ralph Kimball (and others) to handle this situation :

Type #1. Change the data in the dimension table.

Type #2. Add new records to the dimension table that contain the new data.

Type #3. Add new fields to the dimension table to contain the values before and after the change.

Various authors (including Ralph Kimball) also suggest the splitting of dimensions into separate dimensions when a consolidated dimension would result in a high level of dimension change. 

Clickstream Data

Data regarding web browsing.

Web servers capture a large amount of data in the process of receiving requests for web pages. This data includes page served, time, source of the request, type of browser making request, etc. When analyzed, this data provides information about the behavior of individuals who are browsing the internet. It can help businesses analyze where visitors are coming from, what type of visitors are most likely to buy certain products, what type of web pages are most attractive, etc. This information is essential for analyzing the effectiveness of internet ad campaigns and, in general, for finding ways to improve the effectiveness of internet commerce.

Clickstream data typically requires a significant amount of transformation as it is loaded into a data warehouse. Once in the warehouse it can be used for standard reports, for OLAP, and for data mining.

Conformed Dimension

A dimension that is used in more than one cube.

The use of conformed dimensions and shared measures is the primary way a set of data marts can be united into one consolidated data warehouse.

Cube

Also Known As Multidimensional Cube

The fundamental structure for data in a multidimensional (OLAP) system. 

A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell.

CASE

Computer Aided Software Engineering.

CASE Management

The management of information between multiple CASE "encyclopedias," whether the same or different CASE tools.

Catalog

A component of a data dictionary that contains a directory of its DBMS objects as well as attributes of each object.

Central Warehouse

A database created from operational extracts that adheres to a single, consistent, enterprise data model to ensure consistency of decision-support data across the corporation. A style of computing where all the information systems are located and managed from a single physical location.

Change Data Capture

The process of capturing changes made to a production data source. Change data capture is typically performed by reading the source DBMS log. It consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a data warehousing environment.

Classic Data Warehouse Development

The process of building an enterprise business model, creating a system data model, defining and designing a data warehouse architecture, constructing the physical database, and lastly populating the warehouses database.

Client/Server

A distributed technology approach where the processing is divided by function. The server performs shared functions - managing communications, providing database services, etc. The client performs individual user functions - providing customized interfaces, performing screen to screen navigation, offering help functions etc.

Client/Server Processing

A form of cooperative processing in which the end-user interaction is through a programmable workstation (desktop) that must execute some part of the application logic over and above display formatting and terminal emulation.

Collection

A set of data that resulted from a DBMS query.

Communications Integrity

An operational quality that ensures transmitted data has been accurately received at its destination.

Consumer

An individual, group or application that accesses data/information in a data warehouse.

Consumer Profile

Identification of an individual, group or application and a profile of the data they request and use: the kinds of warehouse data, physical relational tables needed, and the required location and frequency of the data (when, where, and in what form it is to be delivered).

Cooperative Processing

A style of computer application processing in which the presentation, business logic, and data management are split among two or more software services that operate on one or more computers. In cooperative processing, individual software programs (services) perform specific functions that are invoked by means of parameterized messages exchanged between them.

Copy Management

The analysis of the business benefit realized by the cost of expenditure on some resource, tool, or application development.

Critical Success Factors

Key areas of activity in which favorable results are necessary for a company to reach its goal.

Crosstab

A process or function that combines and/or summarizes data from one or more sources into a concise format for analysis or reporting.

Currency Date

The date the data is considered effective. It is also known as the "as of" date or temporal currency.



D

Data - SEE Data, Information, and Knowledge and Data-Based Knowledge

Data-Based Knowledge

Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing.

Most of our knowledge is based on a combination of our experience, perception, and intuition. Business Intelligence and Data Warehousing give us a new kind of knowledge based on data.

Data-based knowledge can have several advantages over experience/intuition-based knowledge :

  • It can be more accurate because it is based on so many detailed facts.
  • It can be more current because the data warehousing and business intelligence tools can so quickly analyze new data.
  • It can be more comprehensive because so many different perspectives are available through the rapid recombination of elements from different dimensions and different levels of the data hierarchy.
  • It can give new insights because there are complex patterns in the data that can be discovered by data mining that would never be detected by human analysis.
  • It can be less subjective because conclusions are tied directly to the physical data.

SEE ALSO - Data, Information and Knowledge

NEXT ON TOUR

Data Cleansing

Removing errors and inconsistencies from data being imported into a data warehouse.

SEE ALSO Data Quality Assurance

Data, Information and Knowledge

Data is the reality that a computer records, stores and processes.

The use of computers can be referred to as data processing. At the lowest level data has no significance for people. This lowest level in the perception of reality is sometimes referred to as "raw data".

Information is what a person is able to understand about reality.

Information systems use computers to organize data in such a way that people can understand the results.

Knowledge is what a business uses to make decisions.

The process of organizing information in such a way as to create data-based knowledge is called Data Warehousing. The software products that present this knowledge to users are sometimes called Business Intelligence Tools.

The goal of business intelligence and data warehousing - changing data into information and knowledge.

Organizations are gathering and storing more and more data. Every year the amount of data in the world is approximately doubling. This data is of little benefit unless it can be turned into useful information and knowledge.

Information by itself is an inadequate basis for business decisions because the amount of information, like the amount of data, is overwhelming. Business Intelligence Tools are designed to find what is significant - what really adds to our useful knowledge - in the piles of data and information.

NEXT ON TOUR

Data Mart

Also Known As : Local Data Warehouse or Datamart.

A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.

There are three different (and somewhat contradictory) views of the place of the data mart in the world of data warehousing. 1. The data warehouse gathers all the information from the various legacy systems. Specialized data marts are then created with a subset of the information in the data warehouse. These data marts are easier to use because they only have the particular information the specific user group needs. The use of several data marts also allows the querying load to be spread among several different computers. This can reduce network traffic.

2. Free-standing data marts are created, independent from a data warehouse. The information for the data mart probably comes from just one legacy system. It is quicker and cheaper to build a separate data mart instead of building an enterprise-wide data warehouse with data marts derived from it. The drawback of this solution is that the company's data is not integrated (and thereby violates one of Bill Inmon's original defining characteristics of the data warehouse). If several separate data marts are built using this strategy, they will usually contain data that is duplicated and inconsistent.

3. The data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge. A data mart is built with that group's data. Additional types of information are added to the data mart as time goes on until it is turned into a data warehouse.

New terminology is often created and developed for marketing purposes. The term 'data mart' probably has a marketing advantage over the term 'data warehouse'. The whole data warehousing process is about creating data-based knowledge and bringing that knowledge to people. A warehouse is a place where things are stored away. A mart is a convenient place to buy something. Most data warehousing professionals (including myself) include ready access to information as a defining characteristic of the term 'data warehouse'. I think, though, that the term 'data mart' captures this sense of data availability more effectively. 

A data mart is a logical subset of related information, usually built around one or a few business processes, or a specific subject area. An example is the Assessee Income Data Mart, which holds assessee and income details of salaried and professional citizens.

Data Migration

The movement of data from one environment to another.

This happens when data is brought from a legacy system into a data warehouse.

Data Mining

The process of finding hidden patterns and relationships in the data.

Analyzing data involves the recognition of significant patterns. Human analysts can see patterns in small data sets. Specialized data mining tools are able to find patterns in large amounts of data. These tools are also able to analyze significant relationships that exist only when several dimensions are viewed at the same time.

Users can ask data questions using standard queries when they know what they're looking for. Queries can be written for questions like this: "Which of our out-of-town customers have given us the most business in the last year?"

Data mining is needed when the user's questions are more vague or general in nature. Data mining questions would include: "What attributes characterize the customers that gave us the most business in the past year?"


Data Quality Assurance

Also Known As : Data Cleansing or Data Scrubbing

The process of checking the quality of the data being imported into the data warehouse.

Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage out".

Data quality can be a challenge for several reasons :

  • The data is being consolidated from a variety of legacy sources that may have differing definitions of key concepts such as "customer" or "profit".
  • The legacy data was not originally collected for the purpose of decision support so some of the key data might be missing, incomplete, or not as accurate as desired.
  • There might be times when all the data is not received from one of the legacy systems. This could make comparisons between time periods invalid.

A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed..

In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.

NEXT ON TOUR - PREVIOUS ON TOUR

Data Scrubbing

Removing errors and inconsistencies from data being imported into a data warehouse.

SEE ALSO Data Quality Assurance

Data Transformation

The modification of data as it is moved into the data warehouse.

This modification can include :

  • Data Cleansing - Part of the Process of Data Quality Assurance
  • Dimensionalization - Organizing the data into the multidimensional (OLAP) structure of a star schema.
  • Normalization - Organizing the data into the normal structure of a relational database
  • Processing Calculations
  • Changing Data Types
  • Making the Data More Readable
  • Replacing Codes with Actual Values
  • Summarizing the Data by Various Time Periods - See Aggregations
  • Summarizing the Data in Other Ways - See Aggregations

NEXT ON TOUR - PREVIOUS ON TOUR

Data Warehouse

Also Known As : Datawarehouse or Information Warehouse

A database where data is collected for the purpose of being analyzed.

The defining characteristic of a data warehouse is its purpose.

Most data is collected to handle a company's on-going business. This type of data can be called "operational data". The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).

A data warehouse collects, organizes, and makes data available for the purpose of analysis - to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing).

Bill Inmon coined the term "data warehouse" in 1990. His definition is :

"A (data) warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process."

Subject-oriented - Data that gives information about a particular subject instead of about a company's on-going operations.

Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Time-variant - All data in the data warehouse is identified with a particular time period.

Non-volatile - Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business.

NEXT ON TOUR - PREVIOUS ON TOUR

Data Warehousing Management 

The on-going supervision of the data warehousing process.

Data warehousing is an on-going process. All of the issues that need to be addressed when a data warehousing project is tarted also need to be addressed as the data warehouse is used and, most likely, expanded.

The types of data warehousing management issues that need to be addressed are :

  • Deciding on Management - Who is sponsoring the project? Who is making the tough decisions? Who is going to mediate conflicts?
  • Deciding on Scope - Which business processes are going to be included? What granularity of data is going to be used?
  • Training - Of management personnel, technical personnel, and end users.
  • Staffing - Who is coordinating the project? Who is doing the technical work? Who is doing the training? Budgeting - For hardware, software, personnel, training, consulting.

NEXT ON TOUR - PREVIOUS ON TOUR

Data Warehousing

The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts.

Whether we're building a data warehouse, a data mart, or both, we are taking part in a complex, on-going process. The emphasis in the data-based knowledge business needs to be kept on the process. That's why you're reading a glossary of "data warehousing terminology" instead of a glossary of "data warehouse terminology".

There are many steps in the data warehousing process -

  • Visioning - Having an idea about what could be accomplished.
  • Learning - Studying the potential of data warehousing.
  • Justifying - Developing a business purpose for the process.
  • Budgeting - Counting the cost.
  • Deciding - Making a commitment to develop and use data-based knowledge.
  • Gathering Information - Examining legacy systems.
  • Interviewing Users - Finding what information is needed.
  • Choosing Tools - Choosing the hardware, the database management system, the data extraction tools, and the Business Intelligence tools..
  • Building, Using, Testing, and Evaluating the Prototype - Repeat this step and the above steps as necessary.
  • Deploying - Putting the system into operation.
  • Training - Helping users make full use of the Business Intelligence tools.
  • Managing - Keeping track of scheduled data replication, system usage, and query performance.
  • Adding, Modifying, On-Going Development - As the system is used, new possibilities will be discovered.

Consider also all the actions that take place as a part of the data warehousing process -

Data Transformation - Transforming the legacy data into the form in which it will be stored in the data warehouse.

Data Quality Assurance - Testing the data for inconsistencies and errors.

Data Storage - Storing the data in a DBMS (Database Management System).

Metadata Storage - Storing the description of the data - the data about the data.

Data Mart Population - Populating all the data marts that receive their data from the warehouse.

Setting Up Business Intelligence Tools - Giving users access to the data through multidimensional analysis, querying, and data mining.

Setting Alerts - Establishing conditions that result in an automatic message being sent.

Data Warehousing Management - Keeping track of how well all the other actions are being carried out.

NEXT ON TOUR - PREVIOUS ON TOUR

Data Warehousing Information Center

A great source for data warehousing information. Contains links to information and companies in the data warehousing business.

JUMP TO SITE - The Data Warehousing Information Center

THIS IS THE END OF THE DATA WAREHOUSING TERMINOLOGY TOUR - PREVIOUS ON TOUR

Data Warehousing Institute

An organization for data warehousing professionals.

JUMP TO SITE - The Data Warehousing Institute

Database Management System (DBMS)

The software that is used to store, access, and manage data.

There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multidimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS).

Datamart - SEE Data Mart

Datawarehouse - SEE Data Warehouse

DBMS - SEE Database Management System

Decision Support System (DSS)

A computer system designed to assist an organization in making decisions.

The Decision Support Systems and Enterprise Information Systems of the 1980's and early 1990's were forerunners of today's Business Intelligence Tools.

Density or Dense - SEE Sparsity

Dimension

One of the perspectives that can be used to analyze the data in an OLAP cube.

When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.

For a Sales database, the dimensions could include Product, Time, Store, and Promotion.

Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the the cube. When a dimension has just one hierarchy (which is quite common), people often refer to the dimension itself having levels. Dimensions are categories of attributes organized for ease of data visualization – initially the schema will be organized with three dimensions for geography, time, and ‘axis of analysis’ (everything else!) but you can add dimensions to organize your attributes into further categories and hierarchies. For a Sales database, the dimensions could include Product, Time, Store, and Promotion.

Dimension Table

In a star schema, a table which contains the data for one of the cube's dimensions.

The dimension table has a primary key which is used to connect it to the fact table.

The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension's levels.

Dimensions are the smallest tables in the data warehouse, and the real 'meat' is actually the set of numeric measurements in the Fact tables. They are the entry points, the labels, the groupings, the drill-down paths for your user interface.

The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension.

If there are multiple hierarchies in the dimension, there is one level field for each distinct level in each of the hierarchies. If the hierarchies share some levels in common, they are represented by a single field For Calendar and Fiscal hierarchies in a Time dimension, the level fields could be Fiscal Year, Calendar Year, Fiscal Quarter, Calendar Quarter, Month, and Day.

For the Product dimension table, some of the attribute fields could include Description, Product Number, Product Type, Department, Package Size, Weight, Shelf Length etc.

The dimension tables in a star schema are intentionally de-normalized. The level fields and the attribute fields contain data that is duplicated in many of the records. This normally does not add a significant amount to the amount of storage space needed in the database, because the overall size of each dimension table is very small when compared to the size of the fact table.

Dimensionalization

The process of transforming data into a multidimensional (or star) schema.

Drill Down and Drill Up

The ability to move between levels of the hierarchy when viewing data with an OLAP browser.

Drill Down - Changing the view of the data to a greater level of detail. The term 'drill down' is the process of finding more detailed data by displaying data at a lower level than was previously shown; for example: Category > Sub-category > Product name.

Drill Up - Changing the view of the data to a higher level of aggregation. Like 'drill down', 'drill up' finds data by going up through the layers.

Multidimensional analysis (OLAP) tools organize the data in two primary ways: in multiple dimensions and in hierarchies.

Drilling down and drilling up allow an analyst to move down and up the hierarchies to see how the information at the various levels is related. After looking at the sales totals for a store's departments, the analyst may want to drill down to see the individual sales for each employee in one of the departments. Then the analyst may choose to drill up to view how this store's total sales compare to other stores in the same region.

A component of data analysis. The term “drill down” is the process of finding more detailed data by displaying data at a lower level than was previously show. “Drill up” is the process of finding less detailed data by displaying data at a higher level of aggregation.

Drill Through

Drill through enables you to display underlying data by examining results across dimensions; for example: Category > Men’s, Women’s, Children’s.

DSS (See Decision Support System)

DTS (Data Transformation Services)

An ETL tool provided as a part of Microsoft SQL Server.

DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications.

Data

Items representing facts, text, graphics, bit-mapped images, sound, analog or digital live-video segments. Data is the raw material of a system supplied by data producers and is used by information consumers to create information.

Factual information, especially information organized for analysis or used to reason or make decisions.

Data Access Tools

An end-user oriented tool that allows users to build SQL queries by pointing and clicking on a list of tables and fields in the data warehouse.

Data Analysis and Presentation Tools

Software that provides a logical view of data in a warehouse. Some create simple aliases for table and column names; others create data that identify the contents and location of data in the warehouse.

Data Consumer

An individual, group, or application that receives data in the form of a collection. The data is used for query, analysis and reporting.

Data Custodian

The individual assigned the responsibility of operating systems, data centers, data warehouses, operational databases and business operations in conformance with the policies and practices prescribed by the data owner.

Data Dictionary

A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage. A central location for metadata. Normally, data dictionaries are designed to store a limited set of available metadata, concentrating on the information relating to the data elements, databases, files and programs of implemented systems.

Data Element

The most elementary unit of data that can be identified and described in a dictionary or repository which cannot be subdivided.

Data Extraction Software

Software that reads one or more sources of data and creates a new image of the data.

Data Flow Diagram

A diagram that shows the normal flow of data between services as well as the flow of data between data stores and services.

Data Loading

The process of populating the data warehouse. Data loading is provided by DBMS-specific load processes, DBMS insert processes and independent fastload processes.

Data Management

Controlling, protecting, and facilitating access to data in order to provide information consumers with timely access to the data they need. The functions provided by a database management system.

Data Management Software

Software that converts data into a unified format by taking derived data to create new fields, merging files, summarizing and filtering data; the process of reading data from operational systems. Data Management Software is also known as data extraction software.

Data Mapping

The process of assigning a source data element to a target data element.

Data Mining

A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.

Data Model

A logical map that represents the inherent properties of the data independent of software, hardware or machine performance considerations. The model shows data elements grouped into records, as well as the association around those records.

Data Modeling

A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and structures.

Data Owner

The individual responsible for the policy and practice decisions of data. For business data, the individual may be called a business owner of the data.

Data Partitioning

The process of logically and/or physically partitioning data into segments that are more easily maintained or accessed. Current RDBMS provide this kind of distribution functionality. Partitioning of data aids in performance and utility processing.

Data Pivot

A process of rotating the view of data.

Data Producer

A software service, organization, or person that provides data for update to a system-of-record.

Data Propagation

The distribution of data from one or more source data warehouses to one or more local access databases, according to propagation rules.

Data Replication

The process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source. Changes made to the original source are propagated to the copies of the data in other environments.

Data Scrubbing

The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse.

Data Store

A place where data is stored; data at rest. A generic term that includes databases and flat files.

Data Surfing

See Data Mining.

Data Transfer

The process of moving data from one environment to another environment. An environment may be an application system or operating environment. See Data Transport.

Data Transformation

Creating "information" from data. This includes decoding production data and merging of records from multiple DBMS formats. It is also known as data scrubbing or data cleansing.

Data Transport

The mechanism that moves data from a source to target environment. See Data Transfer.

Data Warehouse

An implementation of an informational database used to store sharable data sourced from an operational database-of-record. It is typically a subject database that allows users to tap into a company's vast store of operational data to track and respond to business trends and facilitate forecasting and planning efforts.

A repository of corporate or institutional data that is organized in a way that is meaningful for Business Analysis and Reporting. It may also store historical information. A data warehouse is a collection of data marts.

Data Warehouse Architecture

An integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting.

Data Warehouse Architecture Development

A service program, created by Software AG, that provides an architecture for a data warehouse that is aligned with the needs of the business. This program identifies and designs a warehouse implementation increment and ensures the required infrastructure, skill sets and other data warehouse foundational aspects are in place for a Data Warehouse Incremental Delivery.

Data Warehouse Engines

Relational databases (RDBMS) and Multi-dimensional databases (MDBMS). Data warehouse engines require strong query capabilities, fast load mechanisms and large storage requirements.

Data Warehouse Incremental Delivery

A program from Software AG that delivers one data warehouse increment from design review through implementation.

Data Warehouse Infrastructure

A combination of technologies and the interaction of technologies that support a data warehousing environment.

Data Warehouse Management Tools

Software that extracts and transforms data from operational systems and loads it into the data warehouse.

Data Warehouse Network

An integrated network of data warehouses that contain sharable data propagated from a source data warehouse on the basis of information consumer demand. The warehouses are managed to control data redundancy and to promote effective use of the sharable data.

Data Warehouse Orientation

A program from Software AG that provides an orientation to business and technical management of opportunities and approaches to data warehousing. The Orientation program encompasses a high level examination of solutions to business problems, return on investment, tools and techniques as they relate to data warehouse implementation. In addition, the program's objective is to assist customers in determining their readiness to proceed with data warehousing and to determine the appropriate data warehouse for their environment.

Database Schema

The logical and physical definition of a database structure.

DBA

Database Administrator.

Decentralized Database

A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area.

Decentralized Warehouse

A remote data source that users can query/access via a central gateway that provides a logical view of corporate data in terms that users can understand. The gateway parses and distributes queries in real time to remote data sources and returns result sets back to users.

Decision Support Systems (DSS)

Software that supports exception reporting, stop light reporting, standard repository, data analysis and rule-based analysis. A database created for end-user ad-hoc query processing.

Delta Update

Only the data that was updated between the last extraction or snapshot process and the current execution of the extraction or snapshot.

Denormalized Data Store

A data store that does not comply to one or more of several normal forms. See Normalization.

Derived Data

Data that is the result of a computational step applied to reference or event data. Derived data is the result either of relating two or more elements of a single transaction (such as an aggregation), or of relating one or more elements of a transaction to an external algorithm or rule.

Desktop Applications

Query and analysis tools that access the source database or data warehouse across a network using an appropriate database interface. An application that manages the human interface for data producers and information consumers.

DRDA

Distributed Relational Database Architecture. A database access standard defined by IBM.

Diving

See Drill Down and Data Mining.

Drill Down

A method of exploring detailed data that was used in creating a summary level of data. Drill down levels depend on the granularity of the data in the data warehouse.

DSS

See Decision Support System.

DWA

Data Warehouse Administrator.

Dynamic Dictionary

A data dictionary that an application program accesses at run time.

Data Visualization

A graphic representation of data; it is a way to clearly and effectively communicate information through graphical means. Data Visualization is the visual interpretation of complex relationships in multidimensional data.

Dashboard

A user interface that organizes and presents information in an easy-to-read format. Help align actions with strategy by tracking and analyzing key business metrics and goals. Enable proactive management through “what-if” analysis, customer segmentation, forecasting and analyzing business processes. Executive or Business Dashboard is a visual display (presentation) of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance. They provide intuitive indicators, such as gauges, charts and bulbs, and show the state of the business at the exact instant the dashboard is viewed or refreshed.

Data Integration

This concept describes the retrieval and organization of business data from various sources producing a unified view for the end user; easier said than done.

Data Governance

This is an emerging discipline that embodies a convergence of data quality, data management, business process management, and risk management surrounding the handling of data in an organization.

Data Quality

Data quality pertains to aspects such as availability, completeness, accuracy, consistency, relevance and timeliness of data. High data quality is essential to business intelligence’s role as a means of decisional support. Poor data quality examples: missing fields, old or inaccurate information, data conflicts, inaccessible data in legacy systems.

Database

A collection of data arranged for ease and speed of search and retrieval. A database is organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.

Dynamic Queries

Dynamically constructed SQL that is usually constructed by desktop-resident query tools. Queries that are not pre-processed and are prepared and executed at run time.



E

Enterprise Information System/Executive Information System (EIS)

Also Known As : Decision Support System (DSS)

A computer system that presents a summary of a company's important data.

ETL (Extract, Transform, and Load)

ETL refers to the process of getting data out of one data store (Extract), modifiying it (Transform) and inserting it into a different data store (Load).

EIS

Executive Information System.

End User Data

Data formatted for end-user query processing; data created by end users data provided by a data warehouse.

Enterprise

A complete business consisting of functions, divisions, or other components used to accomplish specific objectives and defined goals.

Enterprise Data

Data that is defined for use across a corporate environment.

Enterprise Modeling

The development of a common consistent view and understanding of data elements and their relationships across the enterprise.

Enterprise Resource Planning (ERP)

ERP systems are comprised of software programs which tie together all of an enterprise's various functions -- such as finance, manufacturing, sales and human resources. This software also provides for the analysis of the data from these areas to plan production, forecast sales and analyze quality. Today many organizations are realizing that to maximize the value of the information stored in their ERP systems, it is necessary to extend the ERP architectures to include more advanced reporting, analytical and decision support capabilities. This is best accomplished through the application of data warehousing tools and techniques.

Entity Identification

The identification of the entities involved in the subject area. Entity identification is the process of giving data entities unique data elements by which they can be identified.

Entity Relationship Diagramming

A process that visually identifies the relationships between data elements.

Event Analysis

A process of analyzing notifications and taking action based on the notification content.

Event-Based Execution Rules

The process of identifying those tasks that must be successfully executed to completion, or the system events that must occur, before a given task is to be triggered for processing.

Event Data

Data about business events (usually business transactions) that have historic significance or are needed for analysis by other systems. Event data may exist as atomic event data and aggregate data.

Executive Information Systems (EIS)

Tools programmed to provide canned reports or briefing books to top-level executives. They offer strong reporting and drill-down capabilities. Today these tools allow ad-hoc querying against a multi-dimensional database and most offer analytical applications along functional lines such as sales or financial analysis.

Extendibility

The ability to easily add new functionality to existing services without major software rewrites or without redefining the basic architecture.

Extract Date

The date data was extracted.

Extract Frequency

The latency of data extracts, such as daily versus weekly, monthly, quarterly, etc. The frequency that data extracts are needed in the data warehouse is determined by the shortest frequency requested through an order or by the frequency required to maintain consistency of the other associated data types in the source data warehouse.

Extract Specification

The standard expectations of a particular source data warehouse for data extracts from the operational database system-of-record. A system-of-record uses an extract specification to retrieve a snapshot of shared data, and formats the data in the way specified for updating the data in the source data warehouse. An extract specification also contains extract frequency rules for use by the Data Access environment



F

Fact table

In a star schema, the central table which contains the individual facts being stored in the database.

There are two types of fields in a fact table :

  • The fields storing the foreign keys which connect each particular fact to the appropriate value in each dimension.
  • The fields storing the individual facts (or measures) - such as number, amount, or price.

The granularity of the fact table is one of the most significant design decisions in creating a data warehouse. The facts should be as detailed as possible to allow for the data to be viewed from the greatest number of perspectives.

Fastload

A technology that typically replaces a specific DBMS load function. A fastload technology obtains significantly faster load times by preprocessing data and bypassing data integrity checks and logging.

FIFO

A method of posting a transaction in first-in-first-out order. In other words, transactions are posted in the same order that the data producer entered them.

Filters

Saved sets of chosen criteria that specify a subset of information in a data warehouse.

Frequency

The timing characteristics of the data.

Functional Data Warehouse

A warehouse that draws data from nearby operational systems. Each functional warehouse serves a distinct and separate group (such as a division), functional area (such as manufacturing), geographic unit or product marketing group.



G

Granularity

The level of detail of the facts stored in a data warehouse.

Gateway

A software product that allows SQL-based applications to access relational and non-relational data sources.

Global Business Models

Provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models. This type of data warehouse is difficult to build because it requires users from different divisions to come together to define a common data model for the warehouse.



H

Hierarchy

Organization of data into a logical tree structure.

Dimensions can have one or more hierarchies. A Time dimension, for example, could have a Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data into a logical structure. Data is organized by dimensions into parent-child relationships, for example 'Quarter' is the parent of 'Month' in a Time Dimension.  

It is the combination of a multidimensional with a hierarchical view in Business Intelligence Software that allows users to grasp large amounts of data. If each member in a level has 5 to 10 children that are members at the next lower level, the user has a better chance of understanding the significance of the data. 

Moving between the levels of a hierarchy is called drilling up and drilling down.

NEXT ON TOUR - PREVIOUS ON TOUR

Hybrid OLAP (HOLAP)

A combined use of Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP).

In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored using a MOLAP strategy. This combination usually results in the least amount of storage space and the fastest cube processing.

Hyper-Cube

Also Known As Cube and Multidimensional Cube

A cube with more than three dimensions.

A cube is an object with three dimensions. A hyper-cube is a cube-like structure with more than three dimensions. In the world of OLAP, hyper-cubes are nearly always simply referred to as cubes.

NEXT ON TOUR - PREVIOUS ON TOUR

Hash

Data allocated in an algorithmically randomized fashion in an attempt to evenly distribute data and smooth access patterns.

Historical Database

A database that provides an historical perspective on the data.

Host-Driven

A processing method in which the host computer controls the session. A host-driven session typically includes terminal emulation, front ending or client/server types of connections. The host determines what is displayed on the desktop, receives user input from the desktop and determines how the application responds to the input.

Householding

A methodology of consolidating names and addresses.



I

Information - SEE Data-Information-Knowledge Spectrum

The International Data Warehousing Association

An organization for data warehousing professionals.

Immediate Processing

Processing that occurs at the time the request for processing is made. Data may be requested and updated in an immediate mode.

Impact Analysis

Identifying the impact of change on an object to its related objects.

Increment

Data warehouse implementation can be broken down into segments or increments. An increment is a defined data warehouse implementation project that has a specified beginning and end. An increment may also be referred to as a departmental data warehouse within the context of an enterprise.

Info-Glut in Cyberspace

Too much data! (30+ million electronic mailboxes, 7000 CD-ROMs with 650 Megs, 5000+ on-line databases, 500 cable channels, etc.)

Information

Data that has been processed in such a way that it can increase the knowledge of the person who receives it. Information is the output, or "finished goods," of information systems. Information is also what individuals start with before it is fed into a Data Capture transaction processing system.

Information Consumer

A person or software service that uses data to create information.

Information Needs Analysis

The identification and analysis of the needs for information required to satisfy a particular business driver.

Information Systems Architecture

The authoritative definition of the business rules, systems structure, technical framework, and product backbone for business information systems. An information systems architecture consists of four layers: business architecture, systems architecture, technical architecture, and product architecture.

Information Warehouse

IBM's approach to data warehousing that supports the implementation of either functional, central or decentralized warehouses.

Intelligent Agent

A software routine that waits in the background and performs an action when a specified event occurs. For example, agents could transmit a summary file on the first day of the month or monitor incoming data and alert the user when certain transactions have arrived.

Interviews

A procedure to obtain prioritized information needed to generate warehouse increments.

Inverted File Indexes

A more efficient method to access data in an ad-hoc or analysis environment. It maintains indexes to all values contained in an indexed field. Those values, in turn, can be used in any combination to identify records that contain them, without actually scanning them from disk.

NEXT ON TOUR - PREVIOUS ON TOUR



J

Journal File

A file that contains update activity for rollback and data recovery purposes. Examples of update activity are commit checkpoints, as well as "before" and "after" operational database images. A journal file may be used to construct snapshot information for the data warehouse.

K

Key Performance Indicator (KPI)

Key business statistics such as number of new orders, cash collection efficiency, and return on investment (ROI), which measure a firm’s performance in critical areas. KPIs show the progress (or lack of it) toward realizing the firm’s objectives or strategic plans by monitoring activities which (if not properly performed) would likely cause severe losses or outright failure.


Knowledge - SEE Data-Information-Knowledge Spectrum



L

Legacy System

A computer system that's been around for a while.

Sometimes organizations have several legacy systems that have been developed at different times by different people for a variety of purposes. The data in these systems is usually mutually incompatible and sometimes inaccurate. One of the biggest challenges of the data warehousing process is to bring data out of the variety of systems where it currently is located and organize it so it all fits together in the data warehouse.

Level

The hierarchies in dimensions have levels which can be used to view data at various levels of detail.

A Time dimension could have levels for Year, Quarter, Month and Day.

A Product dimension could have levels for Product Family, Product Category, Product Subcategory and Product Name.

A Customer Geography dimension could have levels for Region, Country, District, State, City and Neighborhood.

NEXT ON TOUR - PREVIOUS ON TOUR

Local Cube

A cube contained in a file.

Microsoft Analysis Services (OLAP Services) provides the ability to take all or a subset of a server cube and create a local cube file. The local cube can be used to analyze OLAP data while the user is disconnected from the network.

END OF BUSINESS INTELLIGENCE TOUR - PREVIOUS ON TOUR

Local Data Warehouse - SEE Data Mart

Local Access Database (LAD)

A database that serves individual systems and workgroups as the end point for shared data distribution. LADs are the "retail outlets" of the data warehouse network. They provide direct access to the data requested by specific systems or desktop query services. Data is propagated to LADs from data warehouses according to orders for subsets of certain shared data tables and particular attributes therein, or subsets of standard collections. This data is usually located on a LAN server. If servers are not available and the data is static, it may be located on the users desktop. See Data Warehouse Network.

Local Directory

A data dictionary propagated from the repository to the desktop containing metadata used for developing desktop applications and for generating transactions. A local directory is also used to bind definitions of local data structures used by desktop applications to the data requested from servers.

Location Transparency

A mechanism that keeps the specific physical address of an object from a user. The physical location is resolved within the system so that operations can be performed without knowledge of the actual physical location.

Logical Data Model

Actual implementation of a conceptual module in a database. It may take multiple logical data models to implement one conceptual data model.



M


Multidimensional Schem a SEE Star Schema

MDD (Multidimensional Database) SEE Multidimensional Database Management System (MDBMS)

MDX (Multidimensional Expressions)

The querying language for OLAP cubes.

MDX has some similarities to SQL, but has many unique features. The following query returns a cellset with the names of the store regions  on the columns, the names of product families on the rows and the profit displayed in the cells

select 

[Stores].[Region].Members on columns,

[Products].{Product Family].Members on rows

from SalesCube

where ([Measures].[Profit])

Measure

A numeric value stored in a fact table and in an OLAP cube.

Sales Count, Sales Price, Cost, Discount and Profit could all be measures in an OLAP cube.

NEXT ON TOUR - PREVIOUS ON TOUR

Member

One of the data points for a level of a hierarchy of a dimension.

Some of the members of the Month level of the Time dimension are January, February, March and April.

NEXT ON TOUR - PREVIOUS ON TOUR

Member Property

An attribute of a level that is available for OLAP querying.

In Microsoft Analysis Services you can create member properties for any level. These member properties can be referenced directly in MDX queries and they can also be used for creating virtual dimensions.

Metadata

Also Known As : Meta Data or Meta-data

Data that describes the data in the warehouse.

Metadata includes the following

  • A description of tables and fields in the warehouse, including data types and the range of acceptable values.
  • A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse.
  • A description of how the data has been transformed, including formulae, formatting, currency conversion and time aggregation.
  • Any other information that is needed to support and manage the operation of the data warehouse.

There are a number of companies and organizations attempting to standardize the use of metadata. A standard metadata model would greatly aid the process of integrating data warehousing tools from different companies. Some data warehousing experts believe that the standardization of metadata is impossible.

NEXT ON TOUR - PREVIOUS ON TOUR

Metric - ANOTHER TERM USED FOR Dimension

Multidimensional Analysis

Also Known As: OLAP (On-Line Analytical Processing)

A process of analysis that involves organizing and summarizing data in a multiple number of dimensions.

People can comprehend a far greater amount of information if that information is organized into dimensions and into hierarchies. The wide use of spreadsheets and graphs illustrates the need for people to have their information organized.

A spreadsheet is a two-dimensional analysis tool. If a person could comprehend 10 individual facts, they could possibly comprehend 100 facts if they were arranged in a spreadsheet.

If 3 or 4 or 5 dimensions could be displayed, the amount of information that could be comprehended would be increased exponentially - to 1000 facts, 10,000 facts, and 100,000 facts.

Multidimensional data is also organized  hierarchically, allowing users to "drill down" for more detailed information, "drill up" to see a broader, more summarized view, and "slice and dice" to dynamically change the combinations of dimensions that are being viewed.

NEXT ON TOUR - PREVIOUS ON TOUR

Multidimensional Cube - ANOTHER TERM USED FOR Cube

Multidimensional Database (MDD) - SEE DEFINITION FOR MDBMS

Multidimensional Database Management System (MDBMS)

A database management system that organizes data multidimensionally.

A multidimensional database management system organizes data specifically so it can be viewed with a multidimensional analysis (OLAP) tool. Because it is optimized for this purpose, it has the potential to deliver the information quickly and efficiently.

Multidimensional Online Analytical Programming (MOLAP) 

OLAP that stores data and aggregations in a multidimensional database structures.

Magic Arrow

An arrow used in marketing materials that gives the illusion of an integrated and automated process.

Meta Muck

An environment created when metadata exists in multiple products and repositories (DBMS catalogs; DBMS dictionaries; CASE tools warehouse databases; end user tools; and repositories).

Metadata

Metadata is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries and navigation services. Metadata includes things like the name, length, valid values and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems.

The sum of all documentation about the data warehousing process in its entirety. Metadata describes the contents of the data warehouse, its structure, and the processes involved in its setup. Metadata should be updated as the system evolves. Essentially, metadata is all the information in the data warehouse environment that is not the actual data itself.

Metadata Synchronization

The process of consolidating, relating and synchronizing data elements with the same or similar meaning from different systems. Metadata synchronization joins these differing elements together in the data warehouse to allow for easier access.

Methodology

A system of principles, practices, and procedures applied to a specific branch of knowledge.

Mid-Tier Data Warehouses

To be scalable, any particular implementation of the data access environment may incorporate several intermediate distribution tiers in the data warehouse network. These intermediate tiers act as source data warehouses for geographically isolated sharable data that is needed across several business functions.

Middleware

A communications layer that allows applications to interact across hardware and network environments.

Mini Marts

A small subset of a data warehouse used by a small number of users. A mini mart is a very focused slice of a larger data warehouse.

MIP-O-Suction

A query that consumes a high percentage of CPU cycles.

MIPS

An acronym for millions of instructions per second. MIPS is mistakenly considered a relative measure of computing capability among models and vendors. It is a meaningful measure only among versions of the same processors configured with identical peripherals and software.

MPP

Massive Parallel Processing. The "shared nothing" approach of parallel computing.

Multi-dimensional Database (MDBS and MDBMS)

A powerful database that lets users analyze large amounts of data. An MDBS captures and presents data as arrays that can be arranged in multiple dimensions.

Metrics

Measures of performance that monitor progress and assess trends within an organization. A metric is the comparison of two or more measures.



N

Non-Volatile

Data that does not change.

Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. Non-volatility is one of the original defining characteristics of a data warehouse.

Normalization

The process of organizing data in accordance with the rules of a relational database.

In a completely de-normalized database the customer name and address information would be stored every time a customer made a purchase.

In a normalized database each customer's name and address would be stored only once, in a separate table. Every purchase record would have a reference to the customer table to indicate which customer was involved.

Many individual decisions have to be made in the process of normalizing a de-normalized database. How do we know which customer information refers to the same person? When there is contradictory address information, how do we choose between the various alternatives?

A fully normalized database is usually the most efficient design for an On-Line Transaction Processing System. A data warehouse, with its emphasis on efficient retrieval of data, often benefits from some intentional de-normalization. See the discussion of the Star Schema.

Normalization

The process of reducing a complex data structure into its simplest, most stable structure. In general, the process entails the removal of redundant attributes, keys, and relationships from a conceptual data model.

O

OLAP (On-Line Analytical Processing)

The use of computers to analyze an organization's data.

"OLAP" is the most widely used term for multidimensional analysis software. The term "On-Line Analytical Processing" was developed to distinguish data warehousing activities from "On-Line Transaction Processing" - the use of computers to run the on-going operation of a business.

In its broadest usage the term "OLAP" is used as a synonym of "data warehousing". In a more narrow usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis.

"Think of an OLAP data structure as a Rubik's Cube of data that users can twist and twirl in different ways to work through what-if and what-happened scenarios." - Lee The, Editor, Datamation (May 1995).

OLAP is a category of database software which provides an interface such that users can quickly and interactively examine the results in various dimensions of the data. OLAP primarily involves aggregating large amounts of diverse data. OLAP can involve millions of data items with complex relationships. Its objective is to analyze these relationships and look for patterns, trends, and exceptions.

NEXT ON TOUR - PREVIOUS ON TOUR

OLAP Engine

OLAP is a database technology that has been specially designed to deal with high performance querying and reporting. OLAP data is organised hierarchically and stored in cubes instead of tables.

OLAP Browser

A tool used for multidimensional (OLAP) browsing.

OLAP Services

Business Intelligence tools included with Microsoft SQL Server 7.0. 

OLAP Services was extended and renamed as Analysis Services in SQL Server 2000.

OLAP System

Term that is used as a synonym for datawarehousing system.

OLTP (OnLine Transaction Processing)

The use of computers to run the on-going operation of a business.

Object

A person, place, thing, or concept that has characteristics of interest to an environment. In terms of an object-oriented system, an object is an entity that combines descriptions of data and behavior.

Object Description

All the properties and associations that describe a particular object.

ODBC

Open DataBase Connectivity. A standard for database access co-opted by Microsoft from the SQL Access Group consortium.

OLAP

On-Line Analytical Processing.

OLTP

On-Line Transaction Processing. OLTP describes the requirements for a system that is used in an operational environment.

Operational Database

The database-of-record, consisting of system-specific reference data and event data belonging to a transaction-update system. It may also contain system control data such as indicators, flags, and counters. The operational database is the source of data for the data warehouse. It contains detailed data used to run the day-to-day operations of the business. The data continually changes as updates are made, and reflect the current value of the last transaction.

Operational Data Store (ODS)

An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data.

Order

A message sent to data access services which triggers the delivery of required data. There are three types of orders: select order, transform order and propagate order.



P

PivotTable Services

The tools for client access to Microsoft's Analysis Services (OLAP Services).

Private Dimension

In Microsoft Analysis Services, a dimension that is restricted in use to one particular cube.

Shared (conformed) dimensions are very useful in creating a unified data warehousing structure. You can create a dimension once and use it in several different cubes.

Private dimensions are useful in those situations where you want independent cubes. If you change a private dimension, that change only affects a single cube, whereas the change of a shared dimensions can have implications for many cubes.

Processing

In Microsoft Analysis Services, the building of cube and dimension structures and cube aggregations.

Cubes need to be processed before they can be browsed. For cubes, an incremental process only adds new data into the cube, while a full process reprocesses all the cube's data.

Parallelism

The ability to perform functions in parallel.

Population

See Data Loading and Data Replication.

Product Architecture

One of the four layers of an information systems architecture. It describes standards to be followed in each portion of the technical architecture and vendor-specific tools and services to apply in developing and running applications.

Production Data

Source data which is subject to change. It is a data capture system, often on a corporation's mainframe.

Propagated Data

Data that is transferred from a data source to one or more target environments according to propagation rules. Data propagation is normally based on transaction logic.

Protocol

A set of conventions that govern the communications between processes. Protocol specifies the format and content of messages to be exchanged.

Power User

A Power User is an information analyst who generally has responsibility for analyzing data and deriving meaningful information from that data. Power Users have an excellent understanding of their data and the underlying definitions, and know how to query it to produce the results they need.

Performance Metric

A measure of some activity related to an organization’s/unit’s business performance



Q

QFH

Query From Hell.

Quality Assurance

The process of ensuring a correct result.

Query

A (usually) complex SELECT statement for decision support. See Ad-Hoc Query or Ad-Hoc Query Tool. A request for information placed to a computer system or database. For example, for a sales database, the query could be: 'Top 10 stores by turnover for Q1 in 2010 in each country where there is more than 10 stores'.

Query Governor

A facility that terminates a database query when it has exceeded a predefined threshold.

Query Response Times

The time it takes for the warehouse engine to process a complex query across a large volume of data and return the results to the requester.

Query Tools

Software that allows a user to create and direct specific questions to a database. These tools provide the means for pulling the desired information from a database. They are typically SQL-based tools and allow a user to define data in end-user language.

R

Relational Database Management System (RDBMS)

A Database Management System based on relational theory.

Most modern Database Management Systems (Oracle, Sybase, Microsoft SQL Server) are relational databases (RDBMS). These databases support a standard language - SQL (Structured Query Language). RDBMS is a method of structuring data in the form of records so that relations between different entities and attributes can be used for data access and transformation.

Relational On-Line Analytical Processing (ROLAP)

OLAP that stores data and aggregations in a relational database.

Replication

The physical copying of data from one database to another.

In data warehousing replication takes place as data is moved from the on-line transaction processing system into the data warehouse. Replication also takes place if one or more data marts is being populated with data from the data warehouse.

There are several software tools that have been developed to handle replication into a data warehouse. These tools give the ability to transfer data out of and into a variety of database management systems. Many of these tools also provide data transformation and data cleansing capabilities.

Heterogeneous replication occurs when the source and the target database are not the same database management system.

Data migration is the movement of data from one environment to another - as happens when data is brought from a legacy system into a data warehouse.

Bi-directional replication is the ability to copy data in both directions between two databases.

In changed data capture only the data that has been changed since the last replication is copied.

In synchronization all the data stored in the database is replicated.

NEXT ON TOUR - PREVIOUS ON TOUR

ROLAP - SEE Relational On-Line Analytical Processing

RDBMS

Relational DataBase Management System.

RDBMS Concurrence

Overlapping, concurrent execution of code segments.

Redundancy

The storage of multiple copies of identical data.

Redundancy Control

Management of a distributed data environment to limit excessive copying, update, and transmission costs associated with multiple copies of the same data. Data replication is a strategy for redundancy control with the intention to improve performance.

Reference Data

Business data that has a consistent meaning and definition and is used for reference and validation (Process, Person, Vendor, and Customer, for example). Reference data is fundamental to the operation of the business. The data is used for transaction validation by the data capture environment, decision support systems, and for representation of business rules. Its source for distribution and use is a data warehouse.

Refresh Technology

A process of taking a snapshot from one environment and moving it to another environment overlaying old data with the new data each time.

Replicated Data

Data that is copied from a data source to one or more target environments based on replication rules. Replicated data can consist of full tables or rectangular extracts.

Repository Environment

The repository environment contains the complete set of a business's metadata. It is globally accessible. As compared to a data dictionary, the repository environment not only contains an expanded set of metadata, but can be implemented across multiple hardware platforms and database management systems (DBMS).

Roll Up Queries

Queries that summarize data at a level higher than the previous level of detail.

RPC

Remote Procedure Call.



S

Scale , Scalable, and Scalability

Having to do with the ability of a computer system or a database to operate efficiently with larger quantities of data.

Scalability is often discussed in situations when multiple processors are joined together. The system scales well (or is scalable) if doubling the number of processors also doubles the speed at which the system performs its tasks. The extra work involved in coordinating larger systems usually prevents them from being fully scalable - so that going from one to two processors would increase the total speed by less than a factor of two.

Schema

The logical organization of data in a database.

Shared Dimension

In Microsoft Analysis Services, a dimension used by more than one cube.

In general, a dimension that is used by more than one cube is called a conformed dimension. 

Slice, Slicer, Slicing

The limiting of a cellset to data for a single member from a particular dimension. 

Slicing in MDX is similar to filtering in a relational database. In an MDX query, the WHERE clause is called the slicer. 

Slice and Dice

The ability to move between different combinations of dimensions when viewing data with an OLAP browser.

Multidimensional analysis tools organize the data in two primary ways: in multiple dimensions and in hierarchies.

Slicing and dicing refers to the ability to combine and re-combine the dimensions to see different slices of the information. Picture slicing a three-dimensional cube of information, in order to see what values are contained in the middle layer. Slicing and dicing a cube allows an end-user to do the same thing with multiple dimensions.

A term used to describe a function at the core of informational analysis. Multidimensional tools allow users to view data from any angle. The ability to select various angles to view data from is called slice and dice capability. Rotating the presentation between rows, columns and pages is a feature built into crosstabs and pivot tables.

NEXT ON TOUR - PREVIOUS ON TOUR

Slowly Changing Dimensions (SCD)

A dimension that has levels or attributes that are changing on an occasional basis.
SEE Changing Dimensions.

SQL (Structured Query Language)

The standard language for accessing relational databases.

Snowflaking

Normalization applied to the dimension tables of a star schema.

The star schema is a very simple database design, which clearly presents the multidimensional character of the data and allows for rapid querying of the data in a data warehouse. In snowflaking, some of the fields of the dimension tables are split off into separate tables. This achieves a higher level of normalization, but makes the database design more complex and can reduce the performance and ease of use for Business Intelligence Tools.

Sparsity and Density, Sparse and Dense

The degree to which the cells of a cube are filled with data.

One of the primary challenges of storing multidimensional data is the degree of sparsity that is often encountered. When many dimensions are considered with a fine grain of detail, most of the cells will be empty. 

It is not uncommon for large cubes to have data in fewer than one in a million cells. Expressed numerically, that cube would have a density of less than .0001%.

NEXT ON TOUR - PREVIOUS ON TOUR

Scalability

The ability to scale to support larger or smaller volumes of data and more or less users. The ability to increase or decrease size or capability in cost-effective increments with minimal impact on the unit cost of business and the procurement of additional services.

Schema

The logical and physical definition of data elements, physical characteristics and inter-relationships.

Securability

The ability to provide differing access to individuals according to the classification of data and the user's business function, regardless of the variations.

SELECT

A SQL statement (command) that specifies data retrieval operations for rows of data in a relational database.

Semantic Mapping

The mapping of the meaning of a piece of data.

Server

A service that provides standard functions for clients in response to standard messages from clients.
Note : A commonly used definition of server also refers to the physical computer from which services are provided.

Slice and Dice

A term used to describe a complex data analysis function provided by MDBMS tools.

SMP

Symmetrical Multi-Processing. The "shared everything" approach of parallel computing.

Source Database

An operational, production database or a centralized warehouse that feeds into a target database.

SQL

Structured Query Language. A computer language for accessing relational, ODBC, DRDA, or non-relational compliant database systems.

SQL-Compliant

Conformity to ANSI standards for Structured Query Language specifications.

SQL Query Tool

An end-user tool that accepts SQL to be processed against one or more relational databases.

Standard Query

A stored procedure of a recently executed query. Technically, a standard query may be stored on the desktop as "canned" SQL and passed as dynamic SQL to the server database to execute. This is undesirable unless the stored query is seldom executed.

Static Query

A stored, parameterized procedure, optimized for access to a particular data warehouse.

Stoplighting

A technique using colored circles to identify the content of a data element. The colors are defined by a set of predefined thresholds.

Subject Oriented Databases

Rather than build one massive, centralized data warehouse, most companies are building numerous subject-oriented warehouses to serve the needs of different divisions.

Summarization Tables

These tables are created along commonly used access dimensions to speed query performance, although the redundancies increase the amount of data in the warehouse. See Aggregate Data.

Syntactic Mapping

The mapping required to unravel the syntax of information.

Systems Architecture

One of the four layers of the information systems architecture. The systems architecture represents the definitions and inter-relationships between applications and the product architecture.

Star Schema (Business Definition)

A method of organizing information in a data warehouse that allows the business information to be viewed from many perspectives.

The star is a picture of the way the data is being stored. The basic factual information is in the middle of the star. The points of the star represent various perspectives from which the factual information can be viewed. In Figure 1, the Shipments can be grouped and viewed from the perspective of any of the five points of the star.

Star Schema (Technical Definition)

A database design that consists of a fact table and one or more dimension tables.

Each of the dimension tables has a single field primary key which has a one-to-many relationship with a foreign key in the fact table. The star schema is an intentional simplification of the database design that would be achieved by following the standard rules of normalization. The dimension tables are often flattened, to allow for more efficient querying (see 'snowflaking').

Structured Query Language (SQL)

The standard language for accessing relational databases.

Summary Tables

Tables used to store summarized or aggregated data.

Synchronization

The copying of all data in a database replication.



T

Time-variant data

Data that is identified with a particular time period.

Time-variant is one of the original defining characteristics of a data warehouse.

Tactical Data Warehouse Development

The process of selecting a portion of an enterprise and implementing a data warehouse. The process includes constructing a data model for the area, determining the data warehouse architecture, constructing the physical model, and populating the warehouse database. It also includes creating or buying the applications to access the data warehouse, prototyping the tactical warehouses (access definitions, views, etc.) and incorporating end-user feedback.

TANSTAAFL

There Ain't No Such Thing As A Free Lunch! In developing a data warehouse, there is work involved, and there is no "free lunch."

Target Database

The database in which data will be loaded or inserted.

Technical Architecture

One of the four layers of the information systems architecture. The technical architecture defines and describes the interfaces, parameters and protocols used by the product and systems architecture.

The End User Mindset

"Give me what I say I want, then I can tell you what I really want." To build a successful data warehouse, end users must be able to explore the possibilities.

Tool Encyclopedias

Encyclopedias, repositories or dictionaries used by application development tools. The non-definable "repository" used by a tool.

Transformers

Rules applied to change data.

Triggering Data

Data that selects and loads data on a scheduled basis.

Unit of Work Consolidation

The process of consolidating multiple updates to a single row image into a single update.

U

Update

Not allowed in a data warehouse.



V

Virtual Cube

The term used in Microsoft's Analysis Services (OLAP Services) for a cube that is created from portions of one or more base cubes.

A virtual cube is similar to a view in a relational database. It can be used for security purposes, giving users access to only some of the dimensions and measures. It can also be used to show data from separate cubes at the same time.

Virtual cubes are much more useful when you have shared dimensions and measures that are common to all the base cubes that are used. 

Virtual Dimension

The term used in Microsoft's Analysis Services (OLAP Services) for a dimension that is created from one or more member properties in another dimension.

Versioning

The ability for a single definition to maintain information about multiple physical instantiations.

VITAL

An acronym for Virtually Integrated Technical Architecture Lifecycle.

VITAL Compliance

Conformance to the design objectives and principles, distributed computing styles, development approaches, standards, and data-distribution and access techniques; functional compatibility with VITAL.

VLDB

Very Large Databases.

W

Warehouse Business Directory

Provides business professionals access to the data warehouse by browsing a catalog of contents.

Warehouse Technical Directory

Defines and manages an information life cycle, a definition of warehouse construction, change management, impact analysis, distribution and operation of a data warehouse.



X

XML (eXtensible Markup Language)

A method of sharing data between disparate data systems, without needing a direct connection between them.

XML for Analysis Services

An XML schema that can be used to communicate with a Microsoft Analysis Server.


Z


5 Golden Rules
Register A Call