Data - SEE Data, Information, and Knowledge and Data-Based Knowledge
Data-Based Knowledge
Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing.
Most of our knowledge is based on a combination of our experience, perception, and intuition. Business Intelligence and Data Warehousing give us a new kind of knowledge based on data.
Data-based knowledge can have several advantages over experience/intuition-based knowledge :
- It can be more accurate because it is based on so many detailed facts.
- It can be more current because the data warehousing and business intelligence tools can so quickly analyze new data.
- It can be more comprehensive because so many different perspectives are available through the rapid recombination of elements from different dimensions and different levels of the data hierarchy.
- It can give new insights because there are complex patterns in the data that can be discovered by data mining that would never be detected by human analysis.
- It can be less subjective because conclusions are tied directly to the physical data.
SEE ALSO - Data, Information and Knowledge
NEXT ON TOUR
Data Cleansing
Removing errors and inconsistencies from data being imported into a data warehouse.
SEE ALSO Data Quality Assurance
Data, Information and Knowledge
Data is the reality that a computer records, stores and processes.
The use of computers can be referred to as data processing. At the lowest level data has no significance for people. This lowest level in the perception of reality is sometimes referred to as "raw data".
Information is what a person is able to understand about reality.
Information systems use computers to organize data in such a way that people can understand the results.
Knowledge is what a business uses to make decisions.
The process of organizing information in such a way as to create data-based knowledge is called Data Warehousing. The software products that present this knowledge to users are sometimes called Business Intelligence Tools.
The goal of business intelligence and data warehousing - changing data into information and knowledge.
Organizations are gathering and storing more and more data. Every year the amount of data in the world is approximately doubling. This data is of little benefit unless it can be turned into useful information and knowledge.
Information by itself is an inadequate basis for business decisions because the amount of information, like the amount of data, is overwhelming. Business Intelligence Tools are designed to find what is significant - what really adds to our useful knowledge - in the piles of data and information.
NEXT ON TOUR 
Data Mart
Also Known As : Local Data Warehouse or Datamart.
A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.
There are three different (and somewhat contradictory) views of the place of the data mart in the world of data warehousing. 1. The data warehouse gathers all the information from the various legacy systems. Specialized data marts are then created with a subset of the information in the data warehouse. These data marts are easier to use because they only have the particular information the specific user group needs. The use of several data marts also allows the querying load to be spread among several different computers. This can reduce network traffic.
2. Free-standing data marts are created, independent from a data warehouse. The information for the data mart probably comes from just one legacy system. It is quicker and cheaper to build a separate data mart instead of building an enterprise-wide data warehouse with data marts derived from it. The drawback of this solution is that the company's data is not integrated (and thereby violates one of Bill Inmon's original defining characteristics of the data warehouse). If several separate data marts are built using this strategy, they will usually contain data that is duplicated and inconsistent.
3. The data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge. A data mart is built with that group's data. Additional types of information are added to the data mart as time goes on until it is turned into a data warehouse.
New terminology is often created and developed for marketing purposes. The term 'data mart' probably has a marketing advantage over the term 'data warehouse'. The whole data warehousing process is about creating data-based knowledge and bringing that knowledge to people. A warehouse is a place where things are stored away. A mart is a convenient place to buy something. Most data warehousing professionals (including myself) include ready access to information as a defining characteristic of the term 'data warehouse'. I think, though, that the term 'data mart' captures this sense of data availability more effectively.
A data mart is a logical subset of related information, usually built around one or a few business processes, or a specific subject area. An example is the Assessee Income Data Mart, which holds assessee and income details of salaried and professional citizens.
NEXT ON TOUR - PREVIOUS ON TOUR
Data Migration
The movement of data from one environment to another.
This happens when data is brought from a legacy system into a data warehouse.
Data Mining
The process of finding hidden patterns and relationships in the data.
Analyzing data involves the recognition of significant patterns. Human analysts can see patterns in small data sets. Specialized data mining tools are able to find patterns in large amounts of data. These tools are also able to analyze significant relationships that exist only when several dimensions are viewed at the same time.
Users can ask data questions using standard queries when they know what they're looking for. Queries can be written for questions like this: "Which of our out-of-town customers have given us the most business in the last year?"
Data mining is needed when the user's questions are more vague or general in nature. Data mining questions would include: "What attributes characterize the customers that gave us the most business in the past year?"
NEXT ON TOUR - PREVIOUS ON TOUR

Data Quality Assurance
Also Known As : Data Cleansing or Data Scrubbing
The process of checking the quality of the data being imported into the data warehouse.
Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage out".
Data quality can be a challenge for several reasons :
-
The data is being consolidated from a variety of legacy sources that may have differing definitions of key concepts such as "customer" or "profit".
-
The legacy data was not originally collected for the purpose of decision support so some of the key data might be missing, incomplete, or not as accurate as desired.
-
There might be times when all the data is not received from one of the legacy systems. This could make comparisons between time periods invalid.
A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed.. In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.
NEXT ON TOUR - PREVIOUS ON TOUR
Data Scrubbing
Removing errors and inconsistencies from data being imported into a data warehouse.
SEE ALSO Data Quality Assurance
Data Transformation
The modification of data as it is moved into the data warehouse.
This modification can include :
- Data Cleansing - Part of the Process of Data Quality Assurance
- Dimensionalization - Organizing the data into the multidimensional (OLAP) structure of a star schema.
- Normalization - Organizing the data into the normal structure of a relational database
- Processing Calculations
- Changing Data Types
- Making the Data More Readable
- Replacing Codes with Actual Values
- Summarizing the Data by Various Time Periods - See Aggregations
- Summarizing the Data in Other Ways - See Aggregations
NEXT ON TOUR - PREVIOUS ON TOUR 
Data Warehouse
Also Known As : Datawarehouse or Information Warehouse
A database where data is collected for the purpose of being analyzed. The defining characteristic of a data warehouse is its purpose.
Most data is collected to handle a company's on-going business. This type of data can be called "operational data". The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).
A data warehouse collects, organizes, and makes data available for the purpose of analysis - to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing).
Bill Inmon coined the term "data warehouse" in 1990. His definition is :
"A (data) warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process."
Subject-oriented - Data that gives information about a particular subject instead of about a company's on-going operations.
Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant - All data in the data warehouse is identified with a particular time period.
Non-volatile - Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. NEXT ON TOUR - PREVIOUS ON TOUR
Data Warehousing Management
The on-going supervision of the data warehousing process.
Data warehousing is an on-going process. All of the issues that need to be addressed when a data warehousing project is tarted also need to be addressed as the data warehouse is used and, most likely, expanded.
The types of data warehousing management issues that need to be addressed are :
-
Deciding on Management - Who is sponsoring the project? Who is making the tough decisions? Who is going to mediate conflicts?
-
Deciding on Scope - Which business processes are going to be included? What granularity of data is going to be used?
- Training - Of management personnel, technical personnel, and end users.
-
Staffing - Who is coordinating the project? Who is doing the technical work? Who is doing the training? Budgeting - For hardware, software, personnel, training, consulting.
NEXT ON TOUR - PREVIOUS ON TOUR 
Data Warehousing
The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts.
Whether we're building a data warehouse, a data mart, or both, we are taking part in a complex, on-going process. The emphasis in the data-based knowledge business needs to be kept on the process. That's why you're reading a glossary of "data warehousing terminology" instead of a glossary of "data warehouse terminology".
There are many steps in the data warehousing process -
- Visioning - Having an idea about what could be accomplished.
- Learning - Studying the potential of data warehousing.
- Justifying - Developing a business purpose for the process.
- Budgeting - Counting the cost.
- Deciding - Making a commitment to develop and use data-based knowledge.
- Gathering Information - Examining legacy systems.
- Interviewing Users - Finding what information is needed.
- Choosing Tools - Choosing the hardware, the database management system, the data extraction tools, and the Business Intelligence tools..
- Building, Using, Testing, and Evaluating the Prototype - Repeat this step and the above steps as necessary.
- Deploying - Putting the system into operation.
- Training - Helping users make full use of the Business Intelligence tools.
- Managing - Keeping track of scheduled data replication, system usage, and query performance.
- Adding, Modifying, On-Going Development - As the system is used, new possibilities will be discovered.
Consider also all the actions that take place as a part of the data warehousing process -
Data Replication - Periodic copying of legacy data.
Data Transformation - Transforming the legacy data into the form in which it will be stored in the data warehouse.
Data Quality Assurance - Testing the data for inconsistencies and errors.
Data Storage - Storing the data in a DBMS (Database Management System).
Metadata Storage - Storing the description of the data - the data about the data.
Data Mart Population - Populating all the data marts that receive their data from the warehouse.
Setting Up Business Intelligence Tools - Giving users access to the data through multidimensional analysis, querying, and data mining.
Setting Alerts - Establishing conditions that result in an automatic message being sent.
Data Warehousing Management - Keeping track of how well all the other actions are being carried out.
NEXT ON TOUR - PREVIOUS ON TOUR
Data Warehousing Information Center
A great source for data warehousing information. Contains links to information and companies in the data warehousing business.
JUMP TO SITE - The Data Warehousing Information Center
THIS IS THE END OF THE DATA WAREHOUSING TERMINOLOGY TOUR - PREVIOUS ON TOUR
Data Warehousing Institute
An organization for data warehousing professionals.
JUMP TO SITE - The Data Warehousing Institute
NEXT ON TOUR - PREVIOUS ON TOUR
Database Management System (DBMS)
The software that is used to store, access, and manage data.
There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multidimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS).
NEXT ON TOUR - PREVIOUS ON TOUR
Datamart - SEE Data Mart
Datawarehouse - SEE Data Warehouse
DBMS - SEE Database Management System
Decision Support System (DSS)
A computer system designed to assist an organization in making decisions.
The Decision Support Systems and Enterprise Information Systems of the 1980's and early 1990's were forerunners of today's Business Intelligence Tools.
Density or Dense - SEE Sparsity
Dimension
One of the perspectives that can be used to analyze the data in an OLAP cube.
When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.
For a Sales database, the dimensions could include Product, Time, Store, and Promotion.
Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the the cube. When a dimension has just one hierarchy (which is quite common), people often refer to the dimension itself having levels. Dimensions are categories of attributes organized for ease of data visualization – initially the schema will be organized with three dimensions for geography, time, and ‘axis of analysis’ (everything else!) but you can add dimensions to organize your attributes into further categories and hierarchies. For a Sales database, the dimensions could include Product, Time, Store, and Promotion.
NEXT ON TOUR - PREVIOUS ON TOUR
Dimension Table
In a star schema, a table which contains the data for one of the cube's dimensions.
The dimension table has a primary key which is used to connect it to the fact table.
The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension's levels.
Dimensions are the smallest tables in the data warehouse, and the real 'meat' is actually the set of numeric measurements in the Fact tables. They are the entry points, the labels, the groupings, the drill-down paths for your user interface.
The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension.
If there are multiple hierarchies in the dimension, there is one level field for each distinct level in each of the hierarchies. If the hierarchies share some levels in common, they are represented by a single field For Calendar and Fiscal hierarchies in a Time dimension, the level fields could be Fiscal Year, Calendar Year, Fiscal Quarter, Calendar Quarter, Month, and Day.
For the Product dimension table, some of the attribute fields could include Description, Product Number, Product Type, Department, Package Size, Weight, Shelf Length etc.
The dimension tables in a star schema are intentionally de-normalized. The level fields and the attribute fields contain data that is duplicated in many of the records. This normally does not add a significant amount to the amount of storage space needed in the database, because the overall size of each dimension table is very small when compared to the size of the fact table. 
Dimensionalization
The process of transforming data into a multidimensional (or star) schema.
Drill Down and Drill Up
The ability to move between levels of the hierarchy when viewing data with an OLAP browser.
Drill Down - Changing the view of the data to a greater level of detail. The term 'drill down' is the process of finding more detailed data by displaying data at a lower level than was previously shown; for example: Category > Sub-category > Product name.
Drill Up - Changing the view of the data to a higher level of aggregation. Like 'drill down', 'drill up' finds data by going up through the layers.
Multidimensional analysis (OLAP) tools organize the data in two primary ways: in multiple dimensions and in hierarchies.
Drilling down and drilling up allow an analyst to move down and up the hierarchies to see how the information at the various levels is related. After looking at the sales totals for a store's departments, the analyst may want to drill down to see the individual sales for each employee in one of the departments. Then the analyst may choose to drill up to view how this store's total sales compare to other stores in the same region.
A component of data analysis. The term “drill down” is the process of finding more detailed data by displaying data at a lower level than was previously show. “Drill up” is the process of finding less detailed data by displaying data at a higher level of aggregation.
NEXT ON TOUR - PREVIOUS ON TOUR
Drill Through
Drill through enables you to display underlying data by examining results across dimensions; for example: Category > Men’s, Women’s, Children’s.
DSS (See Decision Support System)
DTS (Data Transformation Services)
An ETL tool provided as a part of Microsoft SQL Server.
DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications.
Data
Items representing facts, text, graphics, bit-mapped images, sound, analog or digital live-video segments. Data is the raw material of a system supplied by data producers and is used by information consumers to create information.
Factual information, especially information organized for analysis or used to reason or make decisions.
Data Access Tools
An end-user oriented tool that allows users to build SQL queries by pointing and clicking on a list of tables and fields in the data warehouse.
Data Analysis and Presentation Tools
Software that provides a logical view of data in a warehouse. Some create simple aliases for table and column names; others create data that identify the contents and location of data in the warehouse.
Data Consumer
An individual, group, or application that receives data in the form of a collection. The data is used for query, analysis and reporting.
Data Custodian
The individual assigned the responsibility of operating systems, data centers, data warehouses, operational databases and business operations in conformance with the policies and practices prescribed by the data owner.
Data Dictionary
A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage. A central location for metadata. Normally, data dictionaries are designed to store a limited set of available metadata, concentrating on the information relating to the data elements, databases, files and programs of implemented systems.
Data Element
The most elementary unit of data that can be identified and described in a dictionary or repository which cannot be subdivided.
Data Extraction Software
Software that reads one or more sources of data and creates a new image of the data.
Data Flow Diagram
A diagram that shows the normal flow of data between services as well as the flow of data between data stores and services.
Data Loading
The process of populating the data warehouse. Data loading is provided by DBMS-specific load processes, DBMS insert processes and independent fastload processes.
Data Management
Controlling, protecting, and facilitating access to data in order to provide information consumers with timely access to the data they need. The functions provided by a database management system.
Data Management Software
Software that converts data into a unified format by taking derived data to create new fields, merging files, summarizing and filtering data; the process of reading data from operational systems. Data Management Software is also known as data extraction software.
Data Mapping The process of assigning a source data element to a target data element.
Data Mining A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.
Data Model
A logical map that represents the inherent properties of the data independent of software, hardware or machine performance considerations. The model shows data elements grouped into records, as well as the association around those records.
Data Modeling A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and structures.
Data Owner The individual responsible for the policy and practice decisions of data. For business data, the individual may be called a business owner of the data.
Data Partitioning The process of logically and/or physically partitioning data into segments that are more easily maintained or accessed. Current RDBMS provide this kind of distribution functionality. Partitioning of data aids in performance and utility processing.
Data Pivot A process of rotating the view of data.
Data Producer A software service, organization, or person that provides data for update to a system-of-record.
Data Propagation
The distribution of data from one or more source data warehouses to one or more local access databases, according to propagation rules.
Data Replication
The process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source. Changes made to the original source are propagated to the copies of the data in other environments.
Data Scrubbing
The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse.
Data Store
A place where data is stored; data at rest. A generic term that includes databases and flat files.
Data Surfing
See Data Mining.
Data Transfer
The process of moving data from one environment to another environment. An environment may be an application system or operating environment. See Data Transport.
Data Transformation
Creating "information" from data. This includes decoding production data and merging of records from multiple DBMS formats. It is also known as data scrubbing or data cleansing.
Data Transport
The mechanism that moves data from a source to target environment. See Data Transfer.
Data Warehouse
An implementation of an informational database used to store sharable data sourced from an operational database-of-record. It is typically a subject database that allows users to tap into a company's vast store of operational data to track and respond to business trends and facilitate forecasting and planning efforts.
A repository of corporate or institutional data that is organized in a way that is meaningful for Business Analysis and Reporting. It may also store historical information. A data warehouse is a collection of data marts.
Data Warehouse Architecture
An integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting.
Data Warehouse Architecture Development
A service program, created by Software AG, that provides an architecture for a data warehouse that is aligned with the needs of the business. This program identifies and designs a warehouse implementation increment and ensures the required infrastructure, skill sets and other data warehouse foundational aspects are in place for a Data Warehouse Incremental Delivery.
Data Warehouse Engines
Relational databases (RDBMS) and Multi-dimensional databases (MDBMS). Data warehouse engines require strong query capabilities, fast load mechanisms and large storage requirements.
Data Warehouse Incremental Delivery
A program from Software AG that delivers one data warehouse increment from design review through implementation.
Data Warehouse Infrastructure
A combination of technologies and the interaction of technologies that support a data warehousing environment.
Data Warehouse Management Tools
Software that extracts and transforms data from operational systems and loads it into the data warehouse.
Data Warehouse Network
An integrated network of data warehouses that contain sharable data propagated from a source data warehouse on the basis of information consumer demand. The warehouses are managed to control data redundancy and to promote effective use of the sharable data.
Data Warehouse Orientation
A program from Software AG that provides an orientation to business and technical management of opportunities and approaches to data warehousing. The Orientation program encompasses a high level examination of solutions to business problems, return on investment, tools and techniques as they relate to data warehouse implementation. In addition, the program's objective is to assist customers in determining their readiness to proceed with data warehousing and to determine the appropriate data warehouse for their environment.
Database Schema
The logical and physical definition of a database structure.
DBA
Database Administrator.
Decentralized Database
A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area.
Decentralized Warehouse
A remote data source that users can query/access via a central gateway that provides a logical view of corporate data in terms that users can understand. The gateway parses and distributes queries in real time to remote data sources and returns result sets back to users.
Decision Support Systems (DSS)
Software that supports exception reporting, stop light reporting, standard repository, data analysis and rule-based analysis. A database created for end-user ad-hoc query processing.
Delta Update
Only the data that was updated between the last extraction or snapshot process and the current execution of the extraction or snapshot.
Denormalized Data Store
A data store that does not comply to one or more of several normal forms. See Normalization.
Derived Data
Data that is the result of a computational step applied to reference or event data. Derived data is the result either of relating two or more elements of a single transaction (such as an aggregation), or of relating one or more elements of a transaction to an external algorithm or rule.
Desktop Applications
Query and analysis tools that access the source database or data warehouse across a network using an appropriate database interface. An application that manages the human interface for data producers and information consumers.
DRDA
Distributed Relational Database Architecture. A database access standard defined by IBM.
Diving
See Drill Down and Data Mining.
Drill Down
A method of exploring detailed data that was used in creating a summary level of data. Drill down levels depend on the granularity of the data in the data warehouse.
DSS
See Decision Support System.
DWA
Data Warehouse Administrator.
Dynamic Dictionary
A data dictionary that an application program accesses at run time.
Data Visualization
A graphic representation of data; it is a way to clearly and effectively communicate information through graphical means. Data Visualization is the visual interpretation of complex relationships in multidimensional data.
Dashboard
A user interface that organizes and presents information in an easy-to-read format. Help align actions with strategy by tracking and analyzing key business metrics and goals. Enable proactive management through “what-if” analysis, customer segmentation, forecasting and analyzing business processes. Executive or Business Dashboard is a visual display (presentation) of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance. They provide intuitive indicators, such as gauges, charts and bulbs, and show the state of the business at the exact instant the dashboard is viewed or refreshed.
Data Integration
This concept describes the retrieval and organization of business data from various sources producing a unified view for the end user; easier said than done.
Data Governance
This is an emerging discipline that embodies a convergence of data quality, data management, business process management, and risk management surrounding the handling of data in an organization.
Data Quality
Data quality pertains to aspects such as availability, completeness, accuracy, consistency, relevance and timeliness of data. High data quality is essential to business intelligence’s role as a means of decisional support. Poor data quality examples: missing fields, old or inaccurate information, data conflicts, inaccessible data in legacy systems.
Database
A collection of data arranged for ease and speed of search and retrieval. A database is organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.
Dynamic Queries
Dynamically constructed SQL that is usually constructed by desktop-resident query tools. Queries that are not pre-processed and are prepared and executed at run time.
|