TR-1: C. S. Jensen and R. T. Snodgrass, Semantics of Time-Varying Attributes and Their Use for Temporal Database Design
This paper concerns the design of temporal relational database schemas.
Normal forms play a central role during the design of conventional relational databases, and we have previously extended all existing relational normal forms to apply to temporal relations. However these normal forms are all a temporal in nature and do not fully take into account the temporal semantics of the attributes of temporal relations. Consequently additional guidelines for the design of temporal relations are required.
This paper presents a systematic study of important aspects of the temporal semantics of attributes. One such aspect is the observation and update patterns of attributes when an attribute changes value and when the changes are recorded in the database. A related aspect is when the attributes have values. A third aspect is the values themselves of attributes how to derive a value for an attribute at any point in time from stored values. Guidelines for the design of the logical schema of a temporal database are introduced and implications of the temporal attribute semantics for the design of views and the physical schema are considered. The Bitemporal Conceptual Data Model, the data model of the consensus temporal query language TSQL2, serves as the context for the study.
TR-2: K. Torp, C. S. Jensen, and M. Böhlen, Layered Temporal DBMSs—Concepts and Techniques
A wide range of database applications manage time-varying data. Examples include, e.g. accounting, personnel, schedule, and data warehousing applications. At the same time, it is well-known that querying and correctly updating time-varying data is difficult and error-prone when using standard SQL. As a result of a decade of intensive exploration, temporal extensions of SQL have reached a level of maturity and sophistication where it is clear that they offer substantial benefits over SQL when managing time-varying data.
The topic of this paper is the effective implementation of temporally extended SQL’s. Traditionally it has been assumed that a temporal DBMS must be built from scratch utilizing new technologies for storage indexing query optimization concurrency control, and recovery. This paper adopts a quite different approach. Speciffically, it explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation, e.g., Oracle or DB2. The topics covered span the choice of an adequate times tamp domain that include the time variable “now” a comparison of alternative query processing architectures including a partial parser approach update processing and transaction processing the latter including how to ensure ACID properties and assign correct timestamps.
TR-3: H. Gregersen and C. S. Jensen, Temporal Entity-Relationship Models—a Survey
The Entity-Relationship (ER) Model, using varying notaions and with some semantic variations, is enjoying a remarkable, and increasing, popularity in both the research community, the computer science corriculum, and in the industry. In step with the increasing diffusion of relational platforms, ER modeling is growing in popularity. It has been widely recognized that temporal aspects of database schemas are prevalent and difficult to model using the ER model. As a result, how to enable the ER model to properly capture time-varying information has for a decade and a halp been an active area of the database research community. This has led to the proposal of almost a dozen temporally enhanded ER models.
This paper surveys all temporally enhanced ER models known to the authors. It is the first paper to provide a comprehensive overview of temporal ER modeling, and it thus meets a need for consilidating and providing easy access to the research in temporal ER modeling. In the presentation of each model, the paper examines how the time-varying information is captured in the model and present the concepts and modeling constructs of the model. A total of 20 different design properties for temporally enhanced ER models are defined, and each model is characterized according to these properties.
TR-4: K. Torp, R. Snodgrass, and C. S. Jensen, Correct and Efficient Timestamping of Temporal Data
Previous approaches to timestamping temporal data have implicitly assumed that transactions have no duration. In this paper we identify several situations where a sequence of operations over time within a single transaction can violate ACID properties. It has been previously shown that the transaction-time dimension must be timestamped after commit.
This time is not known within the transaction. We describe how to correctly implement most queries that make explicit reference to this (unknown) transaction time, and argue that the rest, which can be syntactically identified, can only be answered with an approximation of the correct value. The drawback of timestamping after commit is that it requires revisiting tuples. We show that this expensive revisiting step is required only before any queries or modifications in subsequent transactions that access prior states; in most cases, revisiting tuples can be postponed, and when to revisit can be syntactically determined. We propose several strategies for revisiting tuples, and we empirically evaluate these strategies in order to determine under which circumstances each is best.
TR-4rev: K. Torp, C. S. Jensen, and R. Snodgrass Effective Timestamping in Databases
Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely transaction time, when the fact are current in the database, and valid time, when the facts are true in the modeled reality. However, with a few exceptions, the assignment of timestamp values have been considered only in the context of individual modification statements.
This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the result of a transaction with the commit time of the transaction is the promised approach. The paper studies how this timestamping may be done using a spectrum of techniques. While many database facts are valid until now, the current time, this value is absent from the existing temporal types. Techniques that address this problem using different substitue values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied. Although querying and modifying the time-varying data is accompanied by a number of subtle problems, we presne a comprehensive approach that provides application programmers with simple, consisten, and efficient support for modifying bitemporal databases in the context of user transactions.
TR-5: K. Torp, C. S. Jensen, and R. T. Snodgrass, Stratum Approaches to Temporal DBMS Implementation
Previous approaches to temporal databases have assumed that a temporal database management system (temporal DBMS) must be implemented from scratch, as an integrated architecture to yield adequate performance and to use new temporal implementation techniques, such as temporal indexes and join algorithms. However, this is a very large and time-consuming task. In this paper we explore how a temporal DBMS can be implemented in a stratum on top of an existing non-temporal DBMS, rendering the task more feasible because it reuses much of the functionality of the underlying conventional DBMS.
At the outset, we discuss the advantages and disadvantages of the stratum architecture compared to the integrated architecture, and we present a set of criteria for a stratum architecture. Subsequently, three different meta architectures for implementing a temporal DBMS in a stratum are identified. Each meta architecture contains several specific architectures, which are examined in turn. Existing temporal DBMS implementations are classified according to the specific architectures identified. Finally, the specific architectures are evaluated according to our criteria. We conclude that a stratum architecture is the best short, medium, and perhaps even long-term, approach to implementing a temporal DBMS. Further, it is possible to integrate existing conventional DBMSs with new temporal implementation techniques, blurring the differences between integrated and stratum architectures.
TR-6: J. Bair, M. Böhlen, C. S. Jensen, and R. T. Snodgrass, Notions of Upward compatibility of Temporal Query Languages
Migrating applications from conventional to temporal database management technology has received scant mention in the research literature. This paper formally defines three increasingly restrictive notions of upward compatibility which capture properties of a temporal SQL with respect to conventional SQL that, when satisfied, provide for a smooth migration of legacy applications to a temporal system. The notions of upward compatibility dictate the semantics of conventional SQL statements and constrain the semantics of extensions to these statements. The paper evaluates the seven extant temporal extensions to SQL, all of which are shown to complicate migration through design decisions that violate one or more of these notions. We then outline how SQL-92 can be systematically extended to become a temporal query language that satisfies all three notions.
TR-7: C. E. Dyreson and R. T. Snodgrass, Supporting Valid-time Indeterminacy
In valid-time indeterminacy it is known that an event stored in a database did in fact occur, but it is not known exactly when the event occurred. In this paper we extend the SQL data model and query language to support valid-time indeterminacy. We represent the occurrence time of an event with a set of possible instants, delimiting when the event might have occurred, and a probability distribution over that set. We also describe query language constructs to retrieve information in the presence of indeterminacy. These constructs enable users to specify their credibility in the underlying data and their plausibility in the relationships among that data. A denotational semantics for SQL’s select statement with optional credibility and plausibility constructs is given. We show that this semantics is reliable, in that it never produces incorrect information, maximal, in that if it were extended to be more informative, the results may not be reliable, and reduces to the previous semantics when there is no indeterminacy. Although the extended data model and query language provide needed modeling capabilities, these extensions appear initially to carry a significant execution cost. A contribution of this paper is to demonstrate that our approach is useful and practical. An efficient representation of valid-time indeterminacy and efficient query processing algorithms are provided. The cost of support for indeterminacy is empirically measured, and is shown to be modest. Finally, we show that the approach is general, by applying it to the temporal query language constructs being proposed for SQL3.
TR-8: R. T. Snodgrass, M. Böhlen, C. S. Jensen, and A. Steiner, Transitioning Temporal support in TSQL2 to SQL3
This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transaction-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
TR-9: M. Böhlen, R. T. Snodgrass, and M. D. Soo, Coalescing in Temporal Databases
Coalescing is a unary operator applicable to temporal databases; it is similar to duplicate elimination in conventional databases. Tuples in a temporal relation that agree on the explicit attribute values and that have adjacent or overlapping time periods are candidates for coalescing. Uncoalesced relations can arise in many ways, e.g., via a projection or union operator, or by not enforcing coalescing on update or insertion. In this paper we show how semantically superfluous coalescing can be eliminated. We then turn to efficiently performing coalescing. We provide a variety of iterative and non-iterative approaches, via SQL and embedded SQL, that require no changes to the DBMS, demonstrating that coalescing can be formulated in SQL-89. Detailed performance studies show that all such approaches are quite expensive. We propose a spectrum of coalescing algorithms within a DBMS, based on nested-loop, explicit partitioning, explicit sorting, temporal sorting, and combined explicit/temporal sorting, as well as their hybrid variants. These algorithms are empirically compared, paying particular attention to attribute skew and timestamp distributions. The experiments show that coalescing can be implemented with reasonable efficiency, and with modest development cost.
TR-10: V. J. Tsotras, C. S. Jensen, and R. T. Snodgrass, A Notation for Spatiotemporal Queries
Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on time dimension(s) with predicates on explicit attributes and/or several spatial dimensions. In the past there was no consistent way to refer to temporal or spatiotemporal queries, thus leading to considerable confusion. In an attempt to eliminate this problem, we propose a new notation for such queries. Our notation is simple and extensible and can be easily applied to multidimensional queries in general.
TR-11: A. Datta, S. Mukherjee, P. Konana, I. Viguier, and A. Bajaj, Multiclass Transaction Scheduling and overload Management in Firm Real-Time Database Systems
Real-Time Database Systems (RTDBSs), have attracted considerable amount of research attention in the recent past and a numb er of imp ortant applications have b een identi ed for such systems, such as telecommunications network management, automated air traffc control, automated nancial trading, pro cess control and military command and control systems. Due to the rapidity of change of the system state in such applications, as well as the inherent complexities in controlling such systems (which result in frequent violation of consistency requirements and consequent rep eated rings of control actions), it is likely that the transaction load in these systems would b e fairly high. Thus RTDBSs need to b e equipp ed with overload management mechanisms. Unfortunately overload management has b een a fairly neglected area in real-time systems research in general and real-time database research in particular. In this pap er we intro duce Adaptive Access Parameter (AAP), a scheduling mechanism for multiclass transactions in real-time database systems that employs an explicit admission control p olicy to manage overload as well as system bias towards particular transaction classes. We show the theoretical underpinnings b ehind AAP and then rep ort a thorough p erformance study that demonstrates AAP’s substantial sup eriority over current algorithms with regard to p erformance metrics as well as computational overhead.
TR-12: A. Datta and I. Viguier, Providing Real-Time Response, State Recency and Temporal Consistency in Databases for Rapidly Changing Environments
Databases have been proposed as the chosen platform to develop automated control systems for environments where the state of the system changes rapidly, and actions need to be taken with timing constraints, such as stock trading, air traffc control, network management and pro cess control to name a few. The workload in such systems consist primarily of two types: (a) updates, which report the state of the environment to the database and arrive very frequently, and (b) transactions which arrive with deadlines. In this paper we develop algorithms to install these updates rapidly enough to ensure “state recency” while allocating suffcient resources to transactions to minimize tardiness. We also require that these transactions read temporally consistent data. In summary, this paper contributes in the following three interrelated issues:
1. First and foremost, we look at some interesting problems regarding amalgamating temporal and real-time databases
2. We look at simultaneous satisfaction of state recency and temporal consistency which are central issues in the class of systems considered in this paper, i.e., rapidly changing systems
3. We design algorithms that attempt to achieve equilibrium with regard to the conjoint processing of frequently arriving state up dates and time constrained transactions
TR-13: A. Datta, A. Celik, J. Kim, D. VanderMeer, and V. Kumar, Adaptive Broadcast Protocols to Support Efficient and Energy Conserving Retrieval from Databases in Mobile Computing Environments
Mobile computing has the potential for managing information globally. Data management issues in mobile computing have received some attention in recent times, and the design of adaptive broadcast protocols has been posed as an important problem. Such protocols are employed by database servers to decide on the content of broadcasts dynamically, in response to client mobility and demand patterns. In this paper we design such protocols and also propose efficient retrieval strategies that may be employed by clients to download information from broadcasts. The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients. We evaluate the performance of our protocols analytically.
TR-14: A. Steiner and M. C. Norrie, Implementing Temporal Databases in Object-Oriented Systems
We present a temporal object data model, query language and system that support temporal database applications. We then show how equivalent temporal constructs and operations could be provided in existing object-oriented database management systems (OODBMS) and describe how we did this in the O2 system. A comparison of the two resulting systems highlights the current limitations to the notions of extensibility supported in existing OODBMS.
TR-15: A. Steiner and M. C. Norrie, A Temporal Extension to a Generic Object Data Model
We present a temporal object model capable of representing the lifespan of objects and also the history of the roles and associations in which they participate. We advocate an approach of temporal generalisation rather than temporal extension in which a model in its entirety is given a temporal semantics through an orthogonal generalisation of all concepts of the model. Our model allows objects to participate in several roles simultaneously. Further, metadata may also have temporal properties allowing the lifespan of the object roles themselves, and constraints over these roles, to be modelled. The model is based on the generic object data model OM and the asso ciated algebra has been generalised into a full temp oral algebra over object roles.
TR-16: J. Wijsen and R. Meersman, On the Complexity of Mining Temporal Trends
We investigate the computational complexity of mining certain trends in temporal databases. A simple example of such trend might be “In general, salaries of employees do not decrease.” The trends considered are formalized by the construct of trend dependency (TD). TD’s can compare attributes over time by using operators of {<; =; >; ≤ ; ≥;≠}. TD satisfaction is characterized by a support and confidence. As TD’s may express meaningful trends, mining them is significant. The TD mining problem studied is the following task: Given a temporal database, find the TD of a specified form that holds with the highest confidence and with support greater than or equal to a specified minimum threshold. This problem is called TDMINE. Unlike most other work in data mining, we primarily focus on the computational complexity of the TDMINE problem—rather than on the performance of algorithms to solve it. Both the number of tuples (cardinality) and the number of attributes can be taken as the “size” of TDMINE. TDMINE can be solved in O (C^2) time where C is the cardinality. If the time requirements are expressed in function of the number of attributes—rather than the cardinality—then the problem turns out NP-complete. We discuss the practical implications of this result.
TR-17: C. S. Jensen and R. T. Snodgrass, Temporal Data Management
A wide range of database applications manage time-varying information. Existing database technology currently provides little support for managing such data. The research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model, store, and query temporal data. This paper introduces the reader to temporal data management, surveys state-of-the-art solutions to challenging aspects of temporal data management, and points to research directions.
TR-18: B. Salzberg and V. J. Tsotras, A Comparison of Access Methods for Temporal Data
This paper attempts a comparison of different indexing techniques which have been proposed for supporting efficient access to temporal data. The comparison is based on a collection of important performance criteria that include the consumed space, the update processing and the query time for representative queries. Since it was not possible to compare actual method implementations against the same data sets, the comparison is based on worst case analysis, hence no assumptions on data distribution or query frequencies are made. When a number of methods have the same asymptotic worst case behavior, features in the methods which affect average case behavior are discussed. Additional criteria examined are the pagination of an index, the ability to cluster related data together and the ability to efficiently separate old from current data (so that larger archival storage media such as write-once optical disks can be used). The purpose of the paper is to identify the difficult problems in accessing temporal data and provide a good description of how the different methods aim to solve them. A general lower bound for answering basic temporal queries is also introduced.
TR-19: Hong Lin, Efficient Convertion Between Temporal Granularities
A temporal granularity is a unit of measuring time, e.g., second, day, week. A granularity graph is a directed graph showing the relationship among the granularities. Efficiently and correctly converting time values within the granularity graph is critical for supporting multiple time granularities in an application program or a database management system. The research involves finding an optimally efficient path in the granularity graph for any pair of granularities and developing an algorithm to perform the conversion operation between the two granularities for anchored time related values, to correctly convert a granule from a specified granularity to another granularity. The research also evaluates several strategies to improve the performance of temporal operations at mixed granularities.
TR-20: M. Böhlen, C. S. Jensen, and B. Skjelluag, Spatio-Temporal Database Support for Legacy Applications
In areas such as finance, marketing, and property and resource management, many database applications manage spatio-temporal data. These applications typically run on top of a relational DBMS and manage spatio-temporal data either using the DBMS, which provides little support, or employ the services of a proprietary system that co-exists with the DBMS, but is separate from and not integrated with the DBMS. This wealth of applications may benefit substantially from built-in, integrated spatio-temporal DBMS support. Providing a foundation for such support is an important and substantial challenge.
This paper initially defines technical requirements to a spatio-temporal DBMS aimed at protecting business investments in the existing legacy applications and at reusing personnel expertise. These requirements provide a foundation for making it economically feasible to migrate legacy applications to a spatio-temporal DBMS. The paper next presents the design of the core of a spatio-temporal extension to SQL–92, called STSQL, that satisfies the requirements. STSQL supports multiple temporal as well as spatial dimensions. Queries may “ignore” any dimension; this provides an important kind of upward compatibility with SQL–92. Queries may also view the tables in a dimensional fashion, where the DBMS provides so-called snapshot reducible query processing for each dimension. Finally, queries may view dimension attributes as if they are no different from other attributes.
TR-21: M. Böhlen, R. Busatto, and C. S.Jensen, Point versus Interval-based Temporal Data Models
The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-based. The meaning chosen for timestamps is important—it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities.
This paper precisely defines the notions of point-based and interval-based temporal data models, thus providing a new, formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point-based models treat snapshot equivalent argument relations identically. This renders point-based models insensitive to coalescing. In contrast, queries in interval-based models give significance to the actual intervals used in the timestamps, thus generally treating non-identical, but possibly snapshot equivalent, relations differently. The paper identifies the notion of time-fragment preservation as the essential defining property of an interval-based data model.
TR-22: G. Di Mauro and L. A. Huebel, A Language Coverage Test Suite for TSQL2
TSQL2 is a temporal query language, designed to query and manipulate time-varying data stored in a relational database. TSQL2 is an upward-compatible extension of the international standard relational query language SQL-92. This document presents a TSQL2 test suite that utilizes many of the features of the TSQL2 temporal query language. The test suite consists of a set of table creation statements, a set of data manipulation statements, and a set of queries. This document includes the result of running the TSQL2 test suite on the 3.01 TSQL2 prototype DBMS from the University of Arizona.
TR-23: R. Bliujute, S. Saltenis, G. Slivinskas, and C. S. Jensen, Systematic Change Management in Dimensional Data Warehousing
With the widespread and increasing use of data warehousing in industry, the design of effective data warehouses and their maintenance has become a focus of attention. Independently of this, the area of temporal databases has been an active area of research for well beyond a decade. This article identifies shortcomings of so-called star schemas, which are widely used in industrial warehousing, in their ability to handle change and subsequently studies the application of temporal techniques for solving these shortcomings.
Star schemas represent a new approach to database design and have gained widespread popularity in data warehousing, but while they have many attractive properties, star schemas do not contend well with so-called slowly changing dimensions and with state-oriented data. We study the use of so-called temporal star schemas that may provide a solution to the identified problems while not fundamentally changing the database design approach. More specifically, we study the relative database size and query performance when using regular star schemas and their temporal counterparts for state-oriented data. We also offer some insight into the relative ease of understanding and querying databases with regular and temporal star schemas.
TR-24: G. Kollios and V. J. Tsotras, Hashing Methods for Temporal Data
External dynamic hashing has been used in traditional database systems as a fast method to answer membership queries. Given a dynamic set S of objects, a membership query asks whether an object with identity k is in the most current S. This paper addresses the more general problem of Temporal Hashing. In this setting changes to the dynamic set are timestamped and the membership query has a temporal predicate, as in: “find whether object with identity k was in the setS at time t”. We present an efficient solution to the Temporal Hashing problem. Our solution, also termed partially persistent hashing, behaves as if a separate, ephemeral (i.e., non-temporal) dynamic hashing scheme is available on every state assumed by setS over time. However if the buckets of these hashing schemes were to be stored for each time of interest, the space would become prohibitively large (quadratic on the total number of changes in set S‘s evolution); instead, our method uses linear space. We compare partially persistent hashing with various straightforward approaches (like the traditional linear hashing scheme, the R-tree and the Multiversion B-Tree) and it provides the faster membership query response time. Partially persistent hashing should be seen as an extension of traditional external dynamic hashing in a temporal environment. It is independent from which ephemeral dynamic hashing scheme is used. While the paper considers linear hashing, the methodology applies to other dynamic hashing schemes as well.
TR-25: R. Bliujute, C. S. Jensen, S. Saltenis, and G. Slivinskas, R-tree Based Indexing of Now-Relative Bitemporal Data
The databases of a wide range of applications, e.g., in data warehousing, store multiple states of time-evolving data. These databases contain a substantial part of now-relative data: data that became valid at some past time and remains valid until the current time. More specifically, two temporal aspects of data are frequently of interest, namely valid time, when data is true, and transaction time, when data is current in the database. The latter aspect is essential in all applications where accountability or trace-ability are required. When both aspects are captured, data is termed bitemporal.
A number of indices have been devised for the efficient support of operations on time-varying data with one time dimension, but only little work, based mostly on R-trees, has addressed the indexing of two- or higher-dimensional temporal data. No indices exist that contend well with now-relative data, which leads to temporal data regions that are continuous functions of time. The paper proposes two extended R*-trees that permit the indexing of data regions that grow continuously over time, by also letting the internal bounding regions grow. Internal regions may be triangular as well as rectangular, and new heuristics for the algorithms that govern the index structure are provided. As a result, dead space and overlap, now also continuous functions of time, are reduced. Performance studies indicate that the best extended index is typically 3-5 times faster than existing R-tree based indices.
TR-26: M. A. Nascimento and M. H. Dunham., Indexing Valid Time Databases Via B+-trees – The MAP21 Approach
We present an approach named MAP21 which uses standard B+-trees, in a multiple disks single processor architecture, to provide efficient indexing of valid time ranges. The approach is based on mapping one dimensional ranges to one dimensional points where the lexicographical order among the ranges is preserved. We compare MAP21 to the Time Index and the R*-tree and we show that MAP21 is comparable to or outperforms those, depending on the number of trees utilized, the degree of parallelization among these and the type of query. The main contribution of this paper is to show that standard B+-trees, available in virtually any DBMS, can be used to provide an efficient temporal index. Although our discussion is made in terms of valid time databases, MAP21 can be used (or extended to be used) within other application domains as well.
TR-27: G. Faria, C. M. B. Medeiros and M. A. Nascimento, An Extensible Framework for Spatio-Temporal Scientific Database Applications
There is a wide range of scientific application domains requiring sophisticated management of spatio-temporal data. However, existing database management systems offer very limited (if any at all) support for managing such data. Thus, it is left to the researchers themselves to repeatedly code this management into each application. Besides being a time consuming task, this process is bound to introduce errors and increase the complexity of application management and data evolution. This paper addresses this very point. We present an extensible framework, based on extending an object-oriented database system, with kernel spatio-temporal classes, data structures and functions, to provide support for the development of spatio-temporal applications. Even though the paper’s arguments are centered on geographic applications, the proposed framework can be used in other application domains where spatial and temporal data evolution must be considered (e.g., Biology).
TR-28: R. T. Snodgrass, Managing Temporal Data: A Five-Part Series
Temporal data is pervasive, and challenging to manage in SQL. The June through October issues of Database Programming and Design (volume 11, issues 6-10) included a special series on temporal databases; the five articles in that series are reproduced here. Three separate case studies: a neonatal intensive care unit, a commercial cattle feed yard, and astronomical star catalogs, were used to illustrate how temporal applications can be implemented in SQL. The concepts of valid time versus transaction time and of current, sequenced and nonsequenced integrity constraints, queries, and modifications were emphasized.
TR-29: R. Bliujute,S. Saltenis, G. Slivinskas, and C. S. Jensen, Developing a DataBlade for a New Index
Many current and potential applications of database technology, e.g., geographical, medical, spatial, and multimedia applications, require efficient support for the management of data with new, complex data types. As a result, the major DBMS vendors are stepping beyond the support for uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike.
This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of what is perhaps currently the most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.
TR-30: R. Bliujute, C. S. Jensen, S. Saltenis, and G. Slivinskas, Light-Weight Indexing of General Bitemporal Data
Most data managed by existing, real-world database applications is time referenced. Data warehouses are good examples. Often, two temporal aspects of data are of interest, namely valid time, when data is true in the mini-world, and transaction time, when data is current in the database, resulting in so-called bitemporal data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now-relative: some data is currently true in the mini-world or is part of the current database state. So, unlike for spatial data, the regions of now-relative bitemporal data grow continuously. Existing indices, including commercially available indices such as B+- and R-trees, typically do not contend well with even small amounts of now-relative data.
This paper proposes a new indexing technique that indexes general bitemporal data efficiently. The technique eliminates the different kinds of growing data regions by means of transformations and then indexes the resulting stationary data regions with four R*-trees, and queries on the original data are mapped to corresponding queries on the transformed data. Extensive performance studies are reported that provide insight into the characteristics and behavior of the four trees storing differently-shaped regions, and they indicate that the new technique yields a performance that is competitive with the best existing index; and unlike this existing index, the new technique does not require extension of the kernel of the DBMS.
TR-31: C. E. Dyreson, W. S. Evans, H. Lin, and R. T. Snodgrass, Efficiently Supporting Temporal Granularities
Granularity is an integral feature of temporal data. For instance, a person’s age is commonly given to the granularity of years and the time of their next airline flight to the granularity of minutes. A granularity creates a discrete image, in terms of granules, of a (possibly continuous) time-line. We present a formal model for granularity in temporal operations that is integrated with temporal indeterminacy, or “don’t know when” information. We also minimally extend the syntax and semantics of SQL-92 to support mixed granularities. This support rests on two operations, scale and cast, that move times between granularities, e.g., from days to months. We demonstrate that our solution is practical by showing how granularities can be specified in a modular fashion, and by outlining a time- and space-efficient implementation. The implementation uses several optimization strategies to mitigate the expense of accommodating multiple granularities.
TR-32: J. Skyt and C. S. Jensen, Vacuuming Temporal Databases
A wide range of real-world database applications, including financial and medical applications, are faced with accountability and trace-ability requirements. These requirements lead to the replacement of the usual update-in-place policy by an append-only policy, yielding so-called transaction-time databases. With logical deletions being implemented as insertions at the physical level, these databases retain all previously current states and are ever-growing. A variety of physical storage structures and indexing techniques as well as query languages have been proposed for transaction-time databases, but the sup- port for physical deletion, termed vacuuming, has received precious little attention. Such vacuuming is called for by, e.g., the laws of many countries. Although necessary, with vacuuming, the database’s previously perfect and reliable recollection of the past may be manipulated via, e.g., selective removal of records pertaining to past states. This paper provides a semantic framework for the vacuuming of transaction-time databases. The main focus is to establish a foundation for the correct and user-friendly processing of queries and updates against vacuumed databases. Queries that may return results affected by vacuuming are intercepted, and the user is presented with the option of issuing similar queries that are not affected by vacuuming.
TR-33: M. A. Nascimento, J. R. O. Silva, and Y. Theodoridis, Access Structures for Moving Points
Several applications require management of data which is spatially dynamic, e.g., tracking of battle ships or moving cells in a blood sample. The capability of handling the temporal aspect, i.e., the history of such type of data, is also important. This paper presents and evaluates three temporal extensions of the R-tree, the 3D R-tree, the 2+3 R-tree and the HR-tree, which are capable of indexing spatiotemporal data. Our experiments have shown that the while the HR-tree was the larger structure, its query processing cost was over 50% smaller than the ones yielded by the 3D R-tree and the 2+3 R-tree. Also compared to the (non-practical) approach of storing one R-tree for each of the spatial database states it offered the same query processing cost, saving around one third of storage space.
TR-34: D. Pfoser and C. S. Jensen, Incremental Join of Time-Oriented Data
Data warehouses as well as a wide range of other databases exhibit a strong temporal orientation: it is important to track the temporal variation of data over several months or, often, years. In addition, data warehouses and databases often exhibit append-only characteristics where old data pertaining to the past is retained while new data pertaining to the present is appended. Performing joins on large databases such as these can be very costly, and the efficient processing of joins is essential to obtain good overall query processing performance. This paper presents a sort-merge-based incremental algorithm for timeoriented data. While incremental computation techniques have proven competitive in many settings, they also introduce a space overhead in the form of differential files. However, for the temporal data explored here, this overhead is avoided because the differential files are already part of the database. In addition, data is naturally sorted, leaving only merging. The incremental algorithm works in a partitioned storage environment and does not assume the availability of indices, making it a competitor to sort-based and nested-loop joins. The paper presents analytical cost formulas as well as simulation-based studies that characterize the performance of the join.
TR-35: H. Gregersen and C. S. Jensen, Conceptual Modeling of Time-Varying Information
A wide range of database applications manage information that varies over time. Many of the underlying database schemas of these were designed using one of the several versions, with varying syntax and semantics, of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that the temporal aspects of the mini-world are pervasive and important, but are also difficult to capture using the ER model. Not surprisingly, several enhancements to the ER model have been proposed in an attempt to more naturally and elegantly support the modeling of temporal aspects of information. Common to the existing temporally extended ER models, few or no specific requirements to the models were given by their designers.
With the existing proposals, an ontological foundation, and novel requirements as its basis, this paper formally defines a graphical, temporally extended ER model. The ontological foundation serves to aid in ensuring a maximally orthogonal design, and the requirements aim, in part, at ensuring a design that naturally extends the syntax and semantics of the regular ER model. The result is a novel model that satisfies an array of properties not satisfied by any single previously proposed model.
TR-36: C. E. Dyreson, M. H. Böhlen, and C. S. Jensen, Capturing and Querying Multiple Aspects of Semistructured Data
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with the semistructured nature of web data. These models organize data in graphs. The nodes in a graph denote objects or values, and each edge is labeled with a single word or phrase. Nodes are described by the labels of the paths that lead to them, and these descriptors serve as the basis for querying.
This paper proposes an extinsible framework for capturing more data semantics in the semistructured data models. Inspired by the multidimensional paradigm that finds application in on-line analytical processing and data warehousing, the framework makes it possible to associate values drawn from an extensible set of dimensions with edges. THe paper considers dimensions that capture temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data. In this way, it accommodates notions from temporal databases, electronic commerce, information quality, and database security.
The paper defines the extensible data model and an accompanying query language that provides new facilities for matching, slicing, and collapsing the enriched paths and for coalescing edges. The paper describes an implemented, SQL-like query language for the extended data model that includes additional constructs for the effective querying of graphs with enriched paths.
TR-37: T. B. Pedersen and C. S. Jensen, Multidimensional Data Modeling for Complex Data
Systems for On-Line Analytical Processing (OLAP) considerably ease the process of analyzing business data and have become widely used in industry. OLAP systems primarily employ multidimensional data models to structure their data. However, current multidimensional data models fall short in their ability to model the complex data found in some real-world application domains. The paper presents nine requirements to multidimensional data models, each of which is exemplified by a real-world, clinical case study. A survey of the existing models reveals that the requirements not currently met include support for many-to-many relationships between facts and dimensions, built-in support for handling change and time, and support for uncertainty as well as different levels of granularity in the data. The paper defines an extended multidimensional data model, which addresses all nine requirements. Along with the model, we present an associated algebra, and outline how to implement the model using relational databases.
TR-38: J. R. O. Silva and M. A. Nascimento, An Incremental Index for Bitemporal Databases
Bitemporal databases record not only the history of tuples in temporal tables, but also record the history of the databases themselves. Indexing structures, which are a critical issue in traditional databases, became even more critical for bitemporal databases. We address this problem by investigating an incremental indexing structure based on R-trees, called the HR-tree, which was originally aimed at spatiotemporal databases. We have found that the HR-tree is much more efficient (up to 80% faster) than previously proposed approaches based on two R-trees when processing transaction time point based queries. As for size, the HR-tree was found to be better suited for medium to large sized batch updates, otherwise it is prone to be quite large.
TR-39: H. Gregersen, L. Mark, C. S. Jensen, Mapping Temporal ER Diagrams to Relational Schemas
Many database applications manage information that varies over time, and most of the database schemas for these applications were designed using one of the several versions of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that temporal aspects of data are pervasive and important to the applications, but are also difficult to capture using the ER model. The research community has developed temporal ER models, in an attempt to provide modeling constructs that more naturally and elegantly support capturing the temporal aspects. Specifically, the temporal models provide enhanced support for capturing aspects such as lifespans, valid time, and transaction time of data.
Because commercial database management systems support neither the ER model nor any temporal ER model as a model for data manipulation—but rather support various versions of the relational model for this purpose—we provide a two-step transformation from temporal ER diagrams, with built-in support for lifespans and valid and transaction time, to relational schemas. The first step of the algorithm translates a temporal ER diagram into relations in a surrogate-based relational target model; and the second step further translates this relational schema into a schema in a lexically-based relational target model.
TR-40: T. Theodoridis, J. R. R. Silva, M. A. Nascimento, On the Generation of Spatiotemporal Datasets
An efficient benchmarking environment for spatiotemporal access methods should at least include modules for: generating synthetic datasets, storing datasets (real datasets included), collecting and running access structures, and visualizing experimental results. Focusing on the dataset repository module, a collection of synthetic data that would simulate a variety of real life scenarios is required. Several algorithms have been implemented in the past to generate static spatial (point or rectangular) data, for instance, following a predefined distribution in the workspace. However, by introducing motion, and thus temporal evolution in spatial object definition, generating synthetic data tends to be a complex problem. In this paper, we discuss the parameters to be considered by a generator for such type of data, propose an algorithm, called “Generate_Spatio_Temporal_Data” (GSTD), which generates sets of moving point or rectangular data that follow an extended set of distributions. Some actual generated datasets are also presented. The GSTD source code and several illustrative examples are currently available in the Internet.
TR-41: N. Kline, J. Li and R. Snodgrass, Specifying Multiple Calenders, Calendric Systems, and Field Tables and Functions in TimeADT
TIMEADT provides multiple calendar support for C and C++ applications. We describe here the TIMEADT automatic generation tool which provides the capability of configuring the TIMEADT system with different calendars, calendric systems, properties. This tool takes calendar specification files and TIMEADT specification file and generates a C source file and a C header file which contain code for integrating calendars and calendric systems into TIMEADT (These generated files can also be used automatically by C++ classes in TimeADT).
Each calendar has a calendar specification file which defines the temporal granularities within the calendar and additional field names. The TIMEADT specification file defines the calendric systems used by the application, as well as the location of calendar specification files, initial property values, aliases for field value functions and tables, renaming and renumbering of granularities, mappings between granularities from different calendars, and declarations of a operating system time function and a mapping between operating system time unit and a particular granularity.
TR-42: B. Moon, J. A. G. Gendrano, M. Park, R. T. Snodgrass, B. C. Hwang, and J. M. Rodrigue, Parallel Aggregations for Temporal Databases
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of resp onse time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this pap er, we intro duce a variety of parallel temporal aggregation algorithms for shared-nothing architectures; these algorithms are based on the sequential Aggregation Tree algorithm. Via an empirical study, we found that the numb er of pro cessing nodes, the partitioning of the data, the placement of results, and the degree of data reduction e ected by the aggregation impacted the p erformance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large numb er of pro cessing nodes; for low data reduction, it only p erformed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge, which was best for the remaining cases.
TR-43: K. Torp, C. S. Jensen, and R. T. Snodgrass, Modification Semantics in Now-Relative Databases
Most real-world databases record time-varying information. In such databases, the notion of “the current time,” or NOW, occurs naturally and prominently. For example, when capturing the past states of a relation usinig begin and end time attributes, tuples that are part of the current state have some past time as their begin time and NOW as their end time. While the semantics of such variable databases has been described in detail and is well understood, the modification of variable databases remain unexplored.
This paper defines the semantics of modifications involving the variable NOW are explored, illustrating that the main problems are with modifications and tuples that reach into the future. The paper defines the semantics of modifications — including insertions, deletions, and updates — if databases without NOW, with NOW, and with values of the type NOW + △, where △ is a non-variable time duration. To accomodate these sematnics, three new timestamp values are introduced. An approximate semantics that does not rely on new timestamp values is also provided. Finally, implementation is explored.
TR-44: S. Saltenis, C. S. Jensen, S. Leutenegger, and M. Lopez, Indexing the Positions of Continuously Moving Objects
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R*-tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three- dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. In addition, a bulkloading algorithm is provided for building and rebuilding the index. A comprehensive performance study is reported.
TR-45: S. Saltenis, C. S. Jensen, R-Tree Based Indexing of General Spatio-Temporal Data
Real-world objects are inherently spatially and temporally referenced, and many database applications rely on databases that record the past, present, and anticipated future locations of, e.g., people or land parcels. As a result, indices that efficiently support queries on the spatio-temporal extents of objects are needed. In contrast, past indexing research has progressed in largely separate spatial and temporal streams. In the former, focus has been on one-, two-, or three-dimensional space; and in the latter, focus has been on one or both of the temporal aspects, or dimensions, of data known as transaction time and valid time. Adding time dimensions to spatial indices, as if time was a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions.This paper proposes an efficient and versatile technique for the indexing of spatio-temporal data with discretely changing spatial extents: the spatial aspect of an object may be a point or may have an extent; both the transaction time and valid time are supported; and a generalized notion of the current time, now, is accommodated for the temporal dimensions. The technique extends the previously proposed R*-tree and borrows from the GR-tree, and it provides means of prioritizing space versus time, enabling it to adapt to spatially and temporally restrictive queries. Performance experiments were performed to evaluate different aspects of the proposed indexing technique, and are included in the paper.
TR-46: F. Grandi, F. Mandreoli, The Valid Web: it’s Time to Go…
The development of the World Wide Web technology on the Internet is a major achievement of computer research in recent years. The great availability of easy-to-access on-line hypermedial documents has been a real revolution in the information world, also for its important social, cultural and economical consequences. However, a scarce attention has been devoted so far to the temporal aspects of the World Wide Web definition and technology, although time undoubtedly plays a pervasive and fundamental role in the reality and in its information counterpart. Moreover, although the management of time-varying information is a consolidated research issue in the database field, no integrations between the World Wide Web and database worlds has been attempted in this respect yet. Notwithstanding, this happens while there is a substantial convergence between the two worlds, with a renewed interest on the XML emergence as the upcoming standard for the representation and exchange of semistructured data on the Internet.
In this work, we present a temporal extension of the World Wide Web by defining a complete XML/XSL infrastructure to support valid-time. The proposed valid-time versioning scheme allows the explicit definition of temporal information within HTML/XML documents, whose contents can then be selectively accessed on the basis of the validity coded within them. By acting on a navigation validity context, the proposed solution makes it possible to navigate through time in a given virtual environment with any XML-compliant browser; this allows, for instance, to cut personalized visit routes for a specific epoch in a virtual museum or a digital historical library, or to visualize the evolution of an archaeological site through successives ages, or to selectively access past issues of magazines, stock quote archives, etc. Moreover, the proposed temporal extension can also be applied in a straightforward way to generic XML-encoded semistructured data, laying the foundations for the management of temporal XML data and the development of temporal XML query languages. In particular, the realization of a temporal virtual museum and the management of temporal XML data (via TSQL2-like query facilities) with the proposed temporal extensions have been shown feasible through a prototype demonstration World Wide Web site, which will also be described in the paper.
TR-47: J. Wijsen, Temporal Dependencies with Order Constraints
We propose a temporal dependency, called trend dependency (TD), which captures a significant family of data evolution regularities. A simple example of such regularity might be: “ Salaries of employess do not decrease.” TDs compare attributes over time using operators of {<; =; >; ≤ ; ≥;≠}. We define a satisfiability problem that is the dual of the logical implication problem for TDs, and we investigate the computational complexity of both problems. We provide an axiomatization of logical implication. As TDs allow expressing meaningful trends, “mining” them from existing databases is interesting. For the purpose of TD mining, TD satisfaction is characterized by the common notions of support and confidence. We study the probelm TDMINE: Given a temporal database, mine the TDs that conform to a given template and whose support and confidence exceed certain threshold values. The complexity of TDMINE is studied, as well as algorithms to solve it. A comparison with related work is provided at the end of the paper. We show that TDs can express several temporal dependencies found in the literature.
TR-48: J. Wijsen and R. T. Ng, Generalizing Temporal Dependencies for Non-Temporal Dimensions
Recently, there has been a lot of interest in temporal granularity, and its applications in temporal dependency theory and data mining. Generalization hierarchies used in multi-dimensional databases and OLAP serve a role similar to that of time granularity in temporal databases, but they also apply to non-temporal dimensions, like space.
In this paper, we first generalize temporal functional dependencies for non-temporal dimensions, which leads to the notion of roll-up dependency (RUD). We show the applicability of RUDs in conceptual modeling and data mining. We then indicate that the notion of time granularity used in temporal databases is generally more expressive than the generalization hierarchies in multi-dimensional databases, and show how this surplus expressiveness can be introduced in non-temporal dimensions, which leads to the formalism of RUD with negation (RUD ̚ ). A complete axiomatization for reasoning about RUD ̚ is given.
TR-49: G. Slivinskas, C. S. Jensen, and R. T. Snodgrass, A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering
Most real-world databases contain substantial amounts of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional DBMS architectures and ones where the temporal support is obtained via a layer on top of a conventional DBMS. This foundation captures duplicates and ordering for all queries, as well as coalescing for temporal queries, thus generalizing all existing approaches known to the authors. It includes a temporally extended relational algebra to which SQL and temporal SQL queries may be mapped, six types of algebraic equivalences, concrete query transformation rules that obey different equivalences, a procedure for determining which types of transformation rules are applicable for optimizing a query, and a query plan enumeration algorithm.
The presented approach partitions the work required by the database implementor to develop a provably correct query optimizer into four stages: the database implementor has to (1) specify operations formally; (2) design and prove correct appropriate transformation rules that satisfy any of the six equivalence types; (3) augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly; and (4) ensure that the mapping generates a correct initial query plan.
TR-50: D. Zhang, V. Tsotras, and B. Seeger, A Comparison of Indexed Temporal Joins
We examine temporal joins in the presence of indexing schemes. Utilizing an index when processing join queries is especially advantageous if the join predicates involve only a portion of the temporal relations. This is a novel problem since temporal indices have various characteristics that can affect join processing drastically. For example, temporal indices commonly introduce record copies to achieve better clustering of records with long intervals. We first identify such issues and show that naïve approaches do not work. We concentrate on common temporal join queries and examine various index-based algorithms for processing them. Two optimization techniques for indexed joins are proposed, namely, the the balancing condition optimization and the virtual height optimization. While we use the Multiversion B+-tree as the temporal index, our results apply to other efficient tree-based temporal indices. We also compare against using a traditional R*-tree as the join index and a Spatially Partitioned Join approach. Based on the temporal data characteristics and the type of temporal join query, we identify the best approach to process the join, a useful outcome for a temporal query optimizer.
TR-51: S.-Y. Chien, V. J. Tsotras, and C. Zaniolo, A Comparative Study of Version Managemant Schemes for XML Documents
The problem of managing multiple versions for XML documents and semistructured data is of significant interest in many DB applications and web-related services. Traditional document version control schemes, such as RCS, suffer from the following two problems. At the logical level, they conceal the structure of the documents by modeling them as sequences of text lines, and storing a document’s evolution as a line-edit script. At the physical level, they can incur in severe storage or processing costs because of their inability to trade-off storage with computation. To solve these problems, we propose version management strategies that preserve the structure of the original document, and apply and extend DB techniques to minimize storage and processing costs. Therefore, we propose and compare three schemes for XML version management, namely, the Usefulness-Based Copy Control, the Multiversion B-Tree, and the Partially Persistent List Method. A common characteristic of these schemes is that they cluster data using the notion of page usefulness, which by selectively copying current information from obsolete pages provides for fast version reconstruction with minimal storage overhead. The cost and performance of these version management schemes are evaluated and compared through extensive analysis and experimentation.
TR-52: D. Zhang, A. Markowetz, V. Tsotras, D. Gunopulos, and B. Seeger, Efficient Computation of Temporal Aggregates with Range Predicates
A temporal aggregation query is an important but costly operation for applications that maintain time-evolving data (data warehouses, temporal databases, etc.). Due to the large volume of such data, performance improvements for temporal aggregation queries are critical. In this paper we examine techniques to compute temporal aggregates that include key-range predicates (range temporal aggregates). In particualar we concentrate on SUM, COUNT and AVG aggregates. This problem is novel; to handle arbitrary key ranges, previous methods would need to keep a seperate index for every possible key range. We propose an approach based on the new index structure called the Multiversion SB-Tree, which incorporates features from both the SB-Tree and the Multiversion B-Tree, to handle arbitrary key-range temporal SUM, COOUNT and AVG queries. We analyze the performance of our approach and present experimental results that show its efficiency.
TR-53: M. Dumas, M.-C. Fauvet, and P.-C. Scholl, TEMPOS: A Platform for Developing Temporal Applications on top of Object DBMS
This paper presents TEMPOS1 , a set of models and languages intended to seamlessly extend the ODMG object database standard with temporal functionalities. The proposed models exploit objectoriented technology to meet some important, yet traditionally neglected design criteria, related to legacy code migration and representation independence.
Two complementary ways for accessing temporal data are offered: a query language and a visual browser. The former one, namely TEMPOQL, is an extension of OQL supporting the manipulation of histories regardless of their representations, by composition of functional constructs. Thereby, the abstraction principle of object-orientation is fulfilled, and the functional nature of OQL is enforced. The visual browser on the other hand, offers operators which support several time-related interactive navigation tasks, such as studying a snapshot of a collection of objects at a given instant, or detecting and examining changes within temporal attributes and relationships.
TEMPOS models and languages have been fully formalized both at the syntactical and the semantical level and have been implemented on top of the O_2 DBMS. Their suitability with regard to applications’ requirements has been validated through concrete case studies.
TR-54: G. Kollios, D. Gunopulos, V. Tsotras, A. Delis, andM. Hadjieleftheriou, Indexing Animated Objects Using Spatiotemporal Access Methods
We present a new approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an oredered sequence of frames, where each frame is a 2-dimensional space occupied by the objects that appear in that frame. The queries of interests are range queries of the form: “find the objects that appear in an area S between frames fi and fj“, as well as nearest neighbor queries like: “find the q nearest objects to a given position A between frames Fi and fj“. The straightforward approach to index such objects considers the frame sequence as another dimension and uses a 3-dimensional access method (like an R-Tree or its variants). This however assigns long “lifetime” intervals to objects athat appear through many consecutive frames. Long intervals are difficult to cluster effeciently in a 3-dimensional index. Instead, we propose to reduce the problem to a partial-persistence problem. Namely, we use a 2-dimensional access method that is made partially persistent. We show that this approach leads to faster query performance while still using storatge proportional to the total number of changes in the frame evolution. WHat differentiates this from traditional temporal indexing approaches is that the objects are allowed to move and/or change their extent continuously between frames. We present novel methods to approximate such object evolutions. We formulate an optimization problem for whiich we provide an optimal solution for the case where objects mvoe linearly. Finally, we present an extensive experimental study of the proposed methods. While we concentrate on animated movies, our approach is general and can be applied to other spatiotemporal applications as well.
TR-55: V. Khatri, S. Ram, R. T. Snodgrass, and G. O’Brien, Supporting User-defined Granularities and Indeterminacy in a Spatiotemporal Conceptual Model
Granularities are integral to spatial and temporal data. A large number of applications require storage of facts along with their temporal and spatial context, which needs to be expressed in terms of appropriate granularities. For many real-world applications, a single granularity in the database is insufficient. In order to support any type of spatial or temporal reasoning, the semantics related to granularities needs to be embedded in the database. Specifying granularities related to facts is an important part of conceptual database design because underspecifying the granularity can restrict an application, affect the relative ordering of events, and impact the topological relationships. Closely related to granularities is indeterminacy, i.e., an occurrence time or location associated with a fact that is not known exactly. In this paper, we present an ontology for spatial granularities that is a natural analog of temporal granularities. We propose an upward-compatible, annotation-based spatiotemporal conceptual model that can comprehensively capture the semantics related to spatial and temporal granularities, and indeterminacy without requiring new spatiotemporal constructs. We specify the formal semantics of this spatiotemporal conceptual model via translation to a conventional conceptual model. To underscore the practical focus of our approach, we describe an on-going case study. We apply our approach to a hydrogeologic application at the United States Geologic Survey and demonstrate that our proposed granularitybased spatiotemporal conceptual model is straightforward to use and is comprehensive.
TR-56: G. Slivinskas, C. S. Jensen, and R. T. Snodgrass, Adaptable Query Optimization and Evaluation in Temporal Middleware
Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query optimization and evaluation mechanisms must be provided, either within the DBMS proper or as a source level translation from temporal queries to conventional SQL. This paper proposes a new approach: using a middleware component on top of a conventional DBMS. This component accepts temporal SQL statements and produces a corresponding query plan consisting of algebraic as well as regular SQL parts. The algebraic parts are processed by the middleware, while the SQL parts are processed by the DBMS. The middleware uses performance feedback from the DBMS to adapt its partitioning of subsequent queries into middleware and DBMS parts. The paper describes the architecture and implementation of the temporal middleware component, termed TANGO, which is based on the Volcano extensible query optimizer and the XXL query processing library. Experiments with the system demonstrate the utility of the middleware‘s internal processing capability and its cost-based mechanism for apportioning the processing between the middleware and the underlying DBMS.
TR-57: D. Pfoser and C. S. Jensen, Querying the Trajectories of On-Line Mobile Objects
Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internet-worked, position-aware mobile objects that receive services where the objects’ position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings.
TR-58: K. H. Ryu and Y. A. Ahn, Application of Moving Objects and Spatiotemporal Reasoning
In order to predict future variations of moving objects which general attributes, locations, and regions of spatial objects are changed over time, spatiotemporal data, domain knowledge, and spatiotemporal operations are required to process together with temporal and spatial attributes of data. However, conventional researches on temporal and spatial reasoning cannot be applied directly to the inference using moving objects, because they have been studied separately on temporal or spatial attribute of data. Therefore, in this paper, we not only define spatial objects in time domain but also propose a new type of moving objects and spatiotemporal reasoning model that has the capability of operation and inference for moving objects. The proposed model is made up of spatiotemporal database, GIS tool, and inference engine for application of spatiotemporal reasoning using moving objects and they execute operations and inferences for moving objects. Finally, to show the applicability of the proposed model, a proper domain is established for the battlefield analysis system to support commander’s decision making in the army operational situation and it is experimented with this domain.
TR-59: D. Pfoser and N. Tryfona, Capturing Fuzziness and Uncertainty of Spatiotemporal Objects
For the majority of spatiotemporal applications, we assume that the modeled world is precise and bounded. Although this simplification is sufficient for applications like the cadastral system, it seems to be unnecessary crude for many other applications handling spatial and temporal extents, such as navigational applications. In this work, we explore fuzziness and uncertainty, which we subsume under the term indeterminacy, in the spatiotemporal context. We first show how the fundamental modeling concepts of spatial objects, attributes, relationships, time points, time periods, and events are influenced by indeterminacy, and show subsequently how these concepts can be combined. Next, we focus on the change of spatial objects and their geometry in time. We outline four scenarios, which identify discrete and continuous change, and we present how to model indeterminate change. We demonstrate the applicability of this proposal by describing the uncertainty related to the movement of point objects, such as the recording of the whereabouts of taxis.
TR-60: P. Terenziani and R. T. Snodgrass, Reconciling Point-based and Interval-based Semantics in Temporal Relational Databases: A Proper Treatment of the Telic/Atelic Distinction
The analysis of the semantics of temporal data and queries plays a central role in the area of temporal databases. Although many different algebræ and models have been proposed, almost all of them are based on a point-based (snapshot) semantics for data. On the other hand, in the areas of linguistics, philosophy, and, recently, artificial intelligence, a debated issue concerns the use of an interval-based versus a point-based semantics. In this paper, we first show some problems inherent in the adoption of a point-based semantics for data, and argue that these problems arise because there is no distinction drawn in the data between telic and atelic facts. We then introduce a three-sorted temporal model and algebra which properly copes with these issues, and which achieves a great flexibility via the introduction of coercion functions for transforming relations of one sort into relations of the other at query time. We show that it is possible to extend SQL/Temporal in a minimal fashion to support this augmented algebra.
TR-61: J. Skyt, C. S. Jensen, and T. B. Pedersen, Specification-Based Data Reduction in Dimensional Data Warehouses
Many data warehouses contain massive amounts of data and grow rapidly. Examples include warehouses with retail sales data capturing customer behavior and warehouses with click-stream data capturing user behavior on web sites. The sheer size of these warehouses makes them increasingly hard to manage and query efficiently. As time passes, old, detailed data in the warehouses tend to become less interesting. However, higher-level summaries of the data, taking up far less space, continue to be of interest. Thus, it is possible to perform data reduction in dimensional data warehouses by aggregating data to higher levels in the dimensions.
This paper presents an effective technique for data reduction that handles the gradual change of the data from new, detailed data to older, summarized data. The technique enables huge storage gains while ensuring the retention of essential data. The data reduction is based on formal specifications of when data should be aggregated to higher levels. Care is taken to ensure that the irreversible data reductions are without semantic problems. It is defined how queries over the resulting data with varying levels of detail are handled, and a strategy for implementing the technique using standard data warehouse technology is described.
TR-62: W. Li, D. Gao, and R. T. Snodgrass, Skew Handling Techniques in Sort-Merge Join
Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. This paper examines the negative ramifications of skew in sort-merge, and proposes several refinements of sort-merge join that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew, and are thereby suitable for replacing existing sort-merge implementations in relational DBMSs. We also show how band sort-merge join performance is significantly enhanced with these refinements.
TR-63: S. Saltenis and C. S. Jensen, Indexing of Moving Objects for Location-Based Services
With the continued proliferation of wireless networks, e.g., based on such evolving standards as WAP and Bluetooth, visionaries predict that the Internet will soon extend to billions of wireless devices, or objects. A substantial fraction of these will offer their changing positions to the (location-based) services, they either use or support. As a result, software technologies that enable the management of the positions of objects capable of continuous movement are in increasingly high demand. This paper assumes what we consider a realistic Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. In this scenario, the possibility of substantial quantities of “expiring” objects introduces a new kind of implicit update, which contributes to rendering the database highly dynamic. The paper presents an R-tree based technique for the indexing of the current positions of such objects. Extensive performance experiments explore the properties of the types of bounding regions that are candidates for being used in the internal entries of the index, and they show that, when compared to the approach where the objects are not assumed to expire, the new indexing technique can improve the search performance by as much as a factor of two or more without sacrificing update performance.
TR-64: V. Khatri, S. Ram and R. T. Snodgrass, ST USM: Bridging the Semantic Gap with a Spatio-Temporal Conceptual Model
The representation of geospatial phenomena in databases is one of the key issues in applications like geo-marketing, environmental modeling, transportation planning and geology. For these geospatial applications, there is a need for abstract concepts that would bridge the conceptual gap between the real world and its spatio-temporal representation in the computer systems. To capture the semantics related to space and time in a conceptual schema, we propose an annotation-based approach that allows a database designer to focus first on the non-temporal and non-spatial aspects of the application, and subsequently augment the schema with spatio-temporal annotations. In this paper, we apply our annotation-based approach to the Unifying Semantic Model (USM) to propose the Spatio-Temporal Unifying Semantic Model (ST USM). ST USM is an upward-compatible, snapshot reducible, annotation-based spatio-temporal conceptual model that can comprehensively capture the semantics related to space and time without adding any new spatio-temporal constructs. We provide the formal semantics of ST USM via a mapping to conventional USM and constraints, from which the logical schema can be derived. We illustrate practical aspects related to our spatiotemporal design methodology with a hydrogeologic application at the United States Geological Survey (USGS) and a web-based design tool, thereby demonstrating simplicity and comprehensiveness to spatio-temporal conceptual modeling.
TR-65: J. Skyt and C. S. Jensen, Persistent Views – a Mechanism for Managing Aging Data
Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. This development is witnessed in, e.g., so-called data webhouses that accumulate click streams from portals, and in other data warehouse-type applications. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effective management of aged data. In very large and growing databases, some data eventually becomes inaccurate or outdated, and may be of reduced interest to the database applications. This paper offers a mechanism, termed persistent views, that aids in flexibly reducing the volume of data, e.g., by enabling the replacement of such “low-interest,” detailed data with aggregated data. The paper motivates persistent views and precisely defines and contrasts these with the related mechanisms of views, snapshots, and physical deletion. The paper also offers a provably correct foundation for implementing persistent views.
TR-66: R. Benetis, C. S. Jensen, G. Karciauskas, and S. Saltenis, Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects
With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for non-moving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points. This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.
TR-67: J. Wijsen and A. Bès, On Query Optimization in a Temporal SPC Algebra
Tuples of a temporal relation are equipped with a valid time period. A simple extension of the SPC (Selection-Projection-Cross product) algebra for temporal relations is defined, which conforms to primitives in existing temporal query languages. In particular, temporal projection involves coalescing of time intervals, which results in non-monotonic queries. Also the “select-from-where” normal form is no longer available in this temporal extension. In view of these temporal peculiarities, it is natural and significant to ask whether query optimization techniques for the SPC algebra still apply in the temporal case. To this extent, we provide a temporal extension of the classical tableau formalism, and show its use and limits for temporal query optimization.
TR-68: F. Grandi and F. Mandreoli, A Formal Model for Temporal Schema Versioning in Object-Oriented Databases
The problem of supporting temporal schema versioning has been extensively studied in the context of the relational model. In the object-oriented environment, previous works were devoted to the study of the different aspects of schema evolution or (non-temporal) versioning in branching models, due to the traditional origination of the object-oriented model from CAD/CAM and CIM. Nowadays, the common use of the object-oriented model for a wide class of applications, extends temporal versioning requirements and expectations also to this model.
In this paper we present a formal model for the support of temporal schema versions in object-oriented databases. Its definition is partially based on a generic (ODMG compatible) object model and partially introduces new concepts. The proposed model supports all the schema changes which are usually considered in the OODB literature, for which an operational semantics and a formal analysis of their correct behaviour is provided. Semantic issues arising from the introduction of temporal schema versioning in a conventional or temporal database (concerning the interaction between the intensional and extensional levels of versioning and the management of data in the presence of multiple schema versions) are also considered.
TR-69: H. Gregersen and C. S. Jensen, On the Ontological Expressiveness of Temporal Extensions to the Entity-Relationship Model
It is widely recognized that temporal aspects of database schemas are prevalent, but also difficult to capture using the ER model. The database research community’s response has been to develop temporally enhanced ER models. However, these models have not been subjected to systematic evaluation. In contrast, the evaluation of modeling methodologies for information systems development is a very active area of research in information systems engineering community, where the need for systematic evaluations of modeling methodologies is well recognized.
Based on a framework from information systems engineering, this paper evaluates the ontological expressiveness of three different temporal enhancements to the ER model, the Entity-Relation-Time model, the TERC+ model, and the Time Extended ER model. Each of these temporal ER model extensions is well-documented, and together the models represent a substantial range of the design space for temporal ER extensions. The evaluation considers the uses of the models for both analysis and design, and the focus is on how well the models capture temporal aspects of reality as well as of relational database designs.
TR-70: G. Slivinskas and C. S. Jensen, Enhancing an Extensible Query Optimizer with Support for Multiple Equivalence Types
Database management systems are continuously being extended with support for new types of data and more advanced querying capabilities. In large part because of this, query optimization has remained a very active area of research throughout the past two decades. At the same time, current commercial optimizers are hard to modify, to incorporate desired changes in, e.g., query algebras, transformation rules, search strategies. This has led to a number of research contributions that aim at creating extensible query optimizers. Examples include Starburst, Volcano, and OPT++.
This paper reports on a study that has enhanced Volcano to support a relational algebra with added temporal operators, such as temporal join and aggregation. This includes the handling of algorithms and cost formulas for these new operators, six types of query equivalences, and accompanying query transformation rules. The paper shows how the Volcano search-space generation and plan-search algorithms were extended to support the six equivalence types, describes other key implementation tasks, and evaluates the extensibility of Volcano.
TR-71: D. Gao, C. S. Jensen, R. T. Snodgrass, and M. D. Soo, Join Operations in Temporal Databases
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins.
We address this need for efficient join evaluation in temporal databases. Our purpose is two-fold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators, and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index based join algorithms. Such algorithms do not rely on auxiliary access paths, but may exploit sort orderings to achieve efficiency.
TR-72: D. Gao and R. T. Snodgrass, Syntax, Semantics, and Query Evaluation of the tXQuery Temporal XML Query Language
As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on time-varying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, τ XQuery, in which we add valid time support to XQuery by minimally extending the syntax and semantics of XQuery. We prove that τ XQuery is XQuery-complete and sequenced queries are snapshot reducible to XQuery. We adopt a stratum approach which maps a τ XQuery query to a conventional XQuery. The paper focuses on how to perform this mapping, in particular, on mapping sequenced queries, which are by far the most challenging. The critical issue of supporting sequenced queries (in any query language) is time-slicing the input data while retaining period timestamping. Timestamps are distributed throughout an XML document, rather than uniformly in tuples, complicating the temporal slicing while also providing opportunities for optimization. We propose four optimizations of our initial maximally-fragmented time-slicing approach: selected node slicing, copy-based per-expression slicing, in-place per-expression slicing, and idiomatic slicing, each of which reduces the number of constant periods over which the query is evaluated. While performance tradeoffs clearly depend on the underlying XQuery engine, we argue that there are queries that favor each of the five approaches. Some example τXQuery queries are mapped to XQuery and are run on an XQuery engine, Galax. The query results obtained from different representations of the temporal information are snapshot equivalent.
TR-73: V. Khatri, S. Ram, and R. T. Snodgrass, Augmenting Database Design-Support Environments to Capture the Spatio-Temporal Data Semantics
Many real world applications need to organize data based on time (e.g., accounting, portfolio management, personnel management, inventory management) and/or space (e.g., facility management, market analysis, transportation, logistics). Underlying these applications is temporal and/or spatial data, referred to as spatio-temporal data. Conventional conceptual models provide a mechanism to elicit data semantics related to “what” is important for an application rather than the “when” and “where” semantics. For spatio-temporal applications listed above, it is left to the database designers to discover, design and implement—on an ad-hoc basis—the temporal and spatial concepts that they need. We describe a database design-support environment that exemplifies our spatio-temporal conceptual design approach: (i) first capture current reality using a conventional conceptual model without considering the temporal or spatial aspects (i.e., “what”); and only then (ii) annotate a conventional schema with spatio-temporal data semantics (i.e., “when/where”). We show how segregating “what” from “when/where” via annotations is straightforward to implement, satisfies ontology- and cognition-based requirements, dovetails with existing database design methodologies, and can support seamless integration of schemas across diverse design-support environments having different syntax but the same underlying semantics. We demonstrate how such an overall design-support environment that supports elicitation of the spatio-temporal semantics can help bridge the semantic gap between the real world and its spatio-temporal representation in the information systems.
TR-74: H. Liu, tDOM: A Time-Aware API for Managing Temporal XML Documents
TR-75: F. Grandi, An Annotated Bibliography on Temporal and Evolution Aspects in the WorldWideWeb
TR-76: M. Sethuraman, Implementation and Evaluation of a Partitioned Store for Transaction-Time Databases
There has been an increasing need for transaction-time databases. Such databases provide the ability to maintain the history of the data as it evolves. Along with this ability comes the problem of reduction in performance of current queries, over an increasing amount of data. Temporally partitioning the store of a transaction-time database provides a promising solution to this problem. We implemented a temporally partitioned store in the transaction-time database τ BerkeleyDB and evaluated its effect on a wide range of queries and update transactions. We also propose changes to MySQL language to provide transaction-time support and have modified τMySQL to support nonsequenced select and a temporally partitioned store.
TR-77: I. Fernando, V. López, R. T. Snodgrass, and B. Moon, Spatiotemporal Aggregate Computation: A Survey
Spatiotemporal databases are becoming increasingly more common. Typically, applications modeling spatiotemporal objects need to process vast amounts of data. In such cases, generating aggregate information from the data set is more useful than individually analyzing every entry. In this paper, we study the most relevant techniques for the evaluation of aggregate queries on spatial, temporal, and spatiotemporal data. We also present a model that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied. This model give us a framework that allows us to analyze and compare the different existing techniques for the evaluation of aggregate queries. At the same time, it aallows us to identify opportunities of research on types of aggregate queries that have not been studied.
TR-78: M. Pelanis, S. Saltenis, and C. S. Jensen, Indexing the Past, Present and Anticipated Future Positions of Moving Objects
With the proliferation of wireless communications and geo-positioning, e-services are envisioned that exploit the positions of a set of continuously moving users to provide context-aware functionality to each individual user. Because advances in disk capacities continue to outperform Moore’s Law, it becomes increasingly feasible to store on-line all the position information obtained from the moving e-service users. With the much slower advances in I/O speeds and many concurrent users, indexing techniques are of essence in this scenario.
Past indexing techniques capture the position of an object up until the time of the most recent position sample, or they represent an object’s position as a constant or linear function of time and capture the position from the current time and into the (near) future. This paper offers an indexing technique capable of capturing the positions of moving objects at all points in time. The index substantially extends partial persistence techniques, which support transaction time, to support valid time for monitoring applications. The performance of a query is independent of the number of past position samples stored for an object. No existing indices exist with these characteristics.
TR-79: C. S. Jensen, H. Lahrmann, S. Pakalnis, and J. Runge, The INFATI Data
The ability to perform meaningful empirical studies is of essence in research in spatio-temporal query processing. Such studies are often necessary to gain detailed insight into the functional and performance characteristics of proposals for new query processing techniques. We present a collection of spatio-temporal data, collected during an intelligent speed adaptation project, termed INFATI, in which some two dozen cars equipped with GPS receivers and logging equipment took part. We describe how the data was collected and how it was “modified” to afford the drivers some degree of anonymity. We also present the road network in which the cars were moving during data collection. The GPS data is publicly available for non-commercial purposes. It is our hope that this resource will help the spatiotemporal research community in its efforts to develop new and better query processing techniques.
TR-80: B. Urgun, C. E. Dyreson, N. Kline, J. K. Miller, R. T. Snodgrass, M. D. Soo, and C. S. Jensen, Integrating Multiple Calendars using tZaman
Programmers world-wide are interested in developing applications that can be used internationally. Part of the internationalization effort is the ability to engineer applications to use dates and times that conform to local calendars yet can inter-operate with dates and times in other calendars, for instance between the Gregorian and Islamic calendars. τZAMAN is a system that provides a natural language and calendar-independent framework for integrating multiple calendars. τZAMAN performs “runtime-binding” of calendars and language support. A running τZAMAN system dynamically loads calendars and language support tables from XML-formatted files. Loading a calendar integrates it with other, already loaded calendars, enabling users of τZAMAN to add, compare, and convert times between multiple calendars. τZAMAN also provides a flexible, calendar-independent framework for parsing temporal literals. Literals can be input and output in XML or plain text, using user-defined formats, and in different languages and character sets. Finally, τZAMAN is a client/server system, enabling shared access to calendar servers spread throughout the web. This paper describes the architecture of τZAMAN and experimentally quantifies the cost of using a calendar server to translate and manipulate dates.
TR-81: F. Wang, X. Zhou, and C. Zaniolo, Using XML to Build Efficient Transaction-Time Temporal Database Systems on Relational Databases
Better support for temporal applications by database systems represents an important technical objective that is difficult to achieve since it requires an integrated solution for several problems, including (i) expressive temporal representations and data models, (ii) powerful languages for temporal queries and snapshot queries, (iii) indexing, clustering and query optimization techniques for managing temporal information efficiently, and (iv) architectures that bring together the different pieces of enabling technology into a robust system. In this paper, we present the ArchIS system that achieves these objectives by supporting a temporally grouped data model on top of RDBMS. ArchIS’ architecture uses (a) XML to support temporally grouped (virtual) representations of the database history, (b) XQuery to express powerful temporal queries on such views, (c) temporal clustering and indexing techniques for managing the actual historical data in a RDBMS, and (d) SQL/XML for executing the queries on the XML views as equivalent queries on the relational DB. The performance studies presented in the paper show that ArchIS is quite effective at storing and retrieving under complex query conditions the transaction-time history of relational databases. By supporting database compression as an option, ArchIS also assures excellent storage efficiency for archived histories. This approach achieves full-functionality transactiontime databases without requiring temporal extensions in XML or database standards.
TR-82: A. Schmidt and C. S. Jensen, Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.
TR-83: F. Wang, X. Zhou, and C. Zaniolo, Efficient XML-based Techniques for Archiving, Querying and Publishing the Histories of Relational Databases
The preservation of digital artifacts represents an unanswered challenge for the modern information society; XML and its query languages provide an effective environment to address this challenge because of their ability to support temporal information and queries. In this paper, we focus on the problem of preserving, publishing, and querying efficiently the history of a relational database. Past research on temporal databases revealed the difficulty of achieving satisfactory solutions using flat relational tables and SQL. Here we show that the problem can be solved using (a) XML to support temporally grouped representations of the database history, and (b) XQuery to express powerful temporal queries on such representations. Furthermore, the approach is quite general and it can be used to preserve and query the history of multi-version XML documents. Then we turn to the problem of efficient implementation, and we investigate alternative approaches, including (i) XML DBMS, (ii) shredding XML into relational tables and using SQL/XML on these tables, (iii) SQL:2003 nested tables, and iv) OR-DBMS extended with XML support. These experiments suggest that a combination of temporal XML views and physical relational tables provides the best approach for managing temporal database information.
TR-84: D. Mallett, M. A. Nascimento, V. Botea, and J. Sander, RDBMS Support for Efficient Indexing of Historical Spatio-Temporal Point Data
Despite pressing need, current RDBMS support for spatiotemporal data is limited and inadequate, and most existing spatiotemporal access methods cannot be readily integrated into an RDBMS. This paper proposes SPIT, an adaptive technique for spatiotemporal storage, indexing and query support that can be fully integrated within any off-the-shelf RDBMS. We initially propose a cost model that assumes a uniform data distribution for determining an optimal partitioning of the data space in terms of query processing time. We then use this model as a basis for a heuristic method for partitioning the data space without making any assumption about the data distribution. Using Oracle as our implementation platform with both real and synthetic datasets, we show that SPIT is robust and significantly outperforms other RDBMS-based options for managing historical spatiotemporal data.
TR-85: P. Terenziani, R. T. Snodgrass, A. Bottrighi, M. Torchio, and G. Molino, Extending Temporal Databases to Deal with Telic/Atelic Medical Data
Objective. In this paper, we aim at defining a general-purpose data model and query language coping with both “telic” and “atelic” medical data.
Background. In the area of Medical Informatics, there is an increasing realization that temporal information plays a crucial role, so that suitable database models and query languages are needed to store and support it. However, despite the wide range of approaches in the area, in this paper we show that a relevant class of medical data cannot be properly dealt with.
Methodology. We first show that data models based on the “point-based” semantics, which is (implicitly or explicitly) assumed by the totality of temporal Database approaches, have several limitations when dealing with “telic” data. We then propose a new model (based on the “interval-based” semantics) to cope with such data, and extend the query language accordingly.
Results. We propose a new three-sorted model and a query language to properly deal with both “telic” and “atelic” medical data (as well as nontemporal data). Our query language is flexible, since it allows one to switch from “atelic” to “telic” data, and vice versa.
TR-86: C. S. Jensen, D. Tiesyte, and N. Tradisauskas, The COST Benchmark – Comparison and Evaluation of Spatio-Temporal Indexes
An infrastructure is emerging that enables the positioning of populations of on-line, mobile service users. In step with this, research in the management of moving objects has attracted substantial attention. In particular, quite a few proposals now exist for the indexing of moving objects, and more are underway. As a result, there is an increasing need for an independent benchmark for spatio-temporal indexes. This report characterizes the spatio-temporal indexing problem and proposes a benchmark for the performance evaluation and comparison of spatio-temporal indexes. Notably, the benchmark takes into account that the available positions of the moving objects are inaccurate, an aspect largely ignored in previous indexing research. The concepts of data and query enlargement are introduced for addressing inaccuracy. As proof of concepts of the benchmark, the report covers the application of the benchmark to three spatio-temporal indexes—the TPR-, TPR*-, and Bx-trees. Based on conceptual analyses of the indexes, performance hypotheses are formulated. Experimental results and consequent guidelines for the usage of these indexes are reported.
TR-87: K. E. Pavlou and R. T. Snodgrass, The Pre-images of Bitwise AND Functions in Forensic Analysis
Mechanisms now exist that detect tampering of a database, through the use of cryptographically-strong hash functions, as well as algorithms for forensic analysis of such tampering, thereby determining who, when, and what. This paper shows that for one such forensic analysis algorithm, the tiled bitmap algorithm, determining when and what, which goes far in determining who, has at its core computing the pre-image of bitwise AND functions. This paper introduces the notion of candidate set (the desired pre-image of a target binary number), provides a complete characterization of the candidate set, and characterizes the cardinality as well as the average cardinality. We then provide an optimal algorithm for computing the candidate set given a target. We show that given an auxiliary candidate array, the candidate set can be computed in constant time. This latter algorithm is applicable when the target is a suffix of another target number for which a candidate set has already been computed.
TR-88: S. Liu, F. Wang, and P. Liu, A Temporal RFID Data Model for Querying Physical Objects
RFID holds the promise of real-time identifying, locating, tracking and monitoring physical objects without line of sight, and can be used for a wide range of pervasive computing applications. To achieve these goals, RFID data have to be collected, transformed and expressively modeled as their virtual counterparts in the virtual world. RFID data, however, have their own unique characteristics – including aggregation, location, temporal and history-oriented – which have to be fully considered and integrated into the data model. The diversity of RFID applications pose further challenges to a generalized framework for RFID data modeling. In this paper, we explore the fundamental characteristics of RFID applications, and classify applications into a set of basic scenarios based on these characteristics. We then develop constructs for modeling each scenario, which then can be integrated to model most complex RFID applications in real world. We further demonstrate that our model provides powerful support on querying physical objects in RFID-based pervasive computing environment.
TR-89: S. Joshi, τXSchema – Support for Data- and Schema-Versioned XML Documents
By utilizing schema-constant periods and cross-wall validation, it is possible to realize a comprehensive system for representing and validating data- and schema-versioned XML documents, while remaining fully compatible with the XML standards.
TR-90: Christian S. Jensen and Richard T. Snodgrass (Editors), Temporal Database Entries for the Springer Encyclopedia of Database Systems
TR-91: Faiz Currim, Sabah Currim, Curtis E. Dyreson, Shailesh Joshi, Richard T. Snodgrass, Stephen W. Thomas, and Eric Roeder, τXSchema: Support for Data- and Schema-Versioned XML Documents
The W3C XML Schema recommendation defines the structure and data types for XML documents. XML Schema lacks explicit support for time-varying XML documents or for time-varying schemas. An XML document evolves as it is updated over time or as it accumulates from a streaming data source. A temporal document records the entire history of a document rather than just its current state or snapshot. Capturing a document’s evolution is vital to providing the ability to recover past versions, track changes over time, and evaluate temporal queries. Capturing the evolution of a document’s schema is similarly important. To date, users have to resort to ad hoc, non-standard mechanisms to create schemas for time-varying XML documents and to deal with evolving schemas.
This report presents a data model and architecture, called τ XSchema, for constructing and validating temporal XML documents through the use of a temporal schema. A temporal schema guides the construction of a temporal document and is essential to managing, querying, and validating temporal documents. The temporal schema consists of a non-temporal (conventional) schema, logical annotation(s), and physical annotation(s). The annotations specify which portion(s) of an XML document can vary over time, how the document can change, and where timestamps should be placed. These components can themselves individually evolve over time. The advantage of using annotations to denote the time-varying aspects is that logical and physical data independence for temporal schemas can be achieved while remaining fully compatible with both existing XML Schema documents and the XML Schema recommendation. This report also describes how to construct a temporal document by “gluing” individual snapshots into an integrated history.
This technical report is divided into three parts: concerning instance versioning, extending to schema versioning, and reviewing the entire τ XSchema language. The first two parts have a parallel struc- ture. Each begins by discussing relevant related work before providing a motivating example that illustrates the challenges of instance and schema versioning, respectively, then lists design decisions made in τ XSchema concerning that challenge. Theoretical considerations (separately for instance and schema versioning), architectural considerations, and implementation details are discussed in that order in each of the two parts. Each part ends with full example schema and instance documents. The third part completes the picture with a discussion of related work and research topics to be considered in the future.
TR-92: Stephen W. Thomas, Richard T. Snodgrass, and Rui Zhang, τBench: Extending XBench with Time
TR-92a: Stephen W. Thomas, Richard T. Snodgrass, and Rui Zhang, τBench: Extending XBench with Time
TR-93: Zouhaier Brahmia, Rafik Bouaziz, Fabio Grandi, and Barbara Oliboni, Schema Versioning in τXSchema-Based Multitemporal XML Repositories
τXSchema [7] is a framework (a language and a suite of tools) for the creation and validation of timevarying XML documents. A τXSchema schema is composed of a conventional XML Schema document annotated with physical and logical annotations. All components of a τXSchema schema (i.e., conventional schema, logical annotations, and physical annotations) can change over time to reflect changes in user requirements or in reference world of the database. Since many applications need to keep track of both data and schema evolution, schema versioning has been long advocated to be the best solution to do this. In this paper, we deal with schema versioning in the τXSchema framework. More precisely, we propose a set of schema change primitives for the maintenance of logical and physical annotations and define their operational semantics.
TR-94: Zouhaier Brahmia, Rafik Bouaziz, Fabio Grandi, and Barbara Oliboni, A Study of Conventional Schema Versioning in the τXSchema Framework
Schema versioning is an indispensable feature for applications using temporal databases and requiring an entire history of data and schema. τ XSchema [6] is an infrastructure for constructing and validating temporal XML documents; but any explicit support for XML schema versioning is offered. A τ XSchema schema is composed of a conventional XML Schema document annotated with physical and logical annotations. All components of a τ XSchema schema (i.e., conventional schema, logical annotations, and physical annotations) can change over time to reflect changes in user requirements or in reference world of the database. In a previous work [9], we deal with versioning of logical and physical annotations. In this work, we study τ XSchema conventional schema versioning: we propose a complete set of low-level primitives for changing such a schema and define their operational semantics.
TR-95: Fabio Grandi, An Annotated Bibliography on Temporal and Evolution Aspects in the SemanticWeb
TR-96: Zouhaier Brahmia, Fabio Grandi, Barbara Oliboni, and Rafik Bouaziz, High-level Operations for Changing Temporal Schema, Conventional Schema and Annotations, in the τXSchema Framework
τXSchema [1] is a framework for constructing and validating time-varying XML documents through the use of a temporal schema. This latter ties together a conventional schema (i.e., a standard XML Schema document) and its corresponding logical and physical annotations, which are stored in an annotation document. Conventional schema and logical and physical annotations undergo changes to respond to changes in user requirements or in the real-world. Consequently, the corresponding temporal schema is also evolving over time. In this report, we study operations which help designers for changing such schema components. Indeed, we propose three sets of high-level operations for changing temporal schema, conventional schema, and annotations. These high-level operations are based on the low-level operations proposed in [2,3], [4,5] and [18]. They are also consistency preserving and more user-friendly than the low-level operations. Besides, we have divided the proposed operations into basic high-level operations (i.e., high-level operations that cannot be defined by using other basic high-level operations) and complex ones.
TR-97: Hind Hamrouni, Zouhaier Brahmia, and Rakik Bouaziz, An Efficient Approach for Detecting and Repairing Data Inconsistencies Resulting from Retroactive Updates in Multi-temporal and Multi-version XML Databases
Multi-temporal XML databases supporting schema versioning contain XML elements of different temporal formats (snapshot, transaction-time, valid-time, and bitemporal), defined under several XML schema versions. These databases support three types of data updates concerned with the time when updates are made: retroactive, proactive, or on-time, dealing with past, future, or current data respectively. A retroactive update (i.e., modifying or deleting a past element) due to a detected error means that the database has included erroneous information during some period in the past and, therefore, its consistency should be restored by correcting all errors and inconsistencies that have occurred in the past. Indeed, all processings that have been carried out during the inconsistency period and have used erroneous information have normally produced erroneous information. In this work, we propose an approach which preserves data consistency in multi-temporal and multi-version XML databases. More precisely, after any retroactive update, the proposed approach allows (i) detecting and analyzing periods of database inconsistency, which result from that update, and (ii) repairing of all inconsistencies and recovery of all side effects.