computer science technology|
Active In SP
Joined: Jan 2010
18-01-2010, 01:31 AM
MPEG-7 .doc (Size: 739.5 KB / Downloads: 209)
Accessing audio and video used to be a simple matter - simple because of the simplicity of the access mechanisms and because of the poverty of the sources. An incommensurable amount of audiovisual information is becoming available in digital form, in digital archives, on the World Wide Web, in broadcast data streams and in personal and professional databases, and this amount is only growing. The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed.
The transition between the second and third millennium abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality and speed of access. The trend is clear: in the next few years, users will be confronted with such a large number of contents provided by multiple sources that efficient and accurate access to this almost infinite amount of content seems unimaginable today. In spite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult, because of the sheer volume. This applies to professional as well as end users. The question of identifying and managing content is not just restricted to database retrieval applications such as digital libraries, but extends to areas like broadcast channel selection, multimedia editing, and multimedia directory services.
This challenging situation demands a timely solution to the problem. MPEG-7 is the answer to this need.
MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the successful standards known as MPEG-1 (1992) and MPEG-2 (1994), and the MPEG-4 standard (Version 1 in 1998, and version 2 in 1999). The MPEG-1 and MPEG-2 standards have enabled the production of widely adopted commercial products, such as Video CD, MP3, digital audio broadcasting (DAB), DVD, digital television (DVB and ATSC), and many video-on-demand trials and commercial services. MPEG-4 is the first real multimedia representation standard, allowing interactivity and a combination of natural and synthetic material coded in the form of objects (it models audiovisual data as a composition of these objects). MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the fields of interactive multimedia, mobile multimedia, interactive graphics and enhanced digital television.
The MPEG-7 standard, formally named "Multimedia Content Description Interface", provides a rich set of standardized tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7.
MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content. This is a challenging task given the broad spectrum of requirements and targeted multimedia applications, and the broad number of audiovisual features of importance in such context.
MPEG-7 has been developed by experts representing broadcasters, electronics manufacturers, content creators and managers, publishers, intellectual property rights managers, telecommunication service providers and academia.
More information about MPEG-7 can be found at the MPEG website (chiariglionempeg ), the MPEG-7 Industry Forum website (mpegif.com), and the MPEG-7 Consortium website (mpeg7.nist.gov). These web pages contain links to a wealth of information about MPEG, including much about MPEG-7, many publicly available documents, several lists of ËœFrequently Asked Questionsâ„¢ and links to other MPEG-7 web pages.
1.1 Context of MPEG-7
Audiovisual information plays an important role in our society, be it recorded in such media as film or magnetic tape or originating, in real time, from some audio or visual sensors and be it analogue or, increasingly, digital. Everyday, more and more audiovisual information is available from many sources around the world and represented in various forms (modalities) of media, such as still pictures, graphics, 3D models, audio, speech, video, and various formats. While audio and visual information used to be consumed directly by the human being, there is an increasing number of cases where the audiovisual information is created, exchanged, retrieved, and re-used by computational systems. This may be the case for such scenarios as image understanding (surveillance, intelligent vision, smart cameras, etc.) and media conversion (speech to text, picture to speech, speech to picture, etc.). Other scenarios are information retrieval (quickly and efficiently searching for various types of multimedia documents of interest to the user) and filtering in a stream of audiovisual content description (to receive only those multimedia data items which satisfy the user preferences). For example, a code in a television program triggers a suitably programmed PVR (Personal Video Recorder) to record that program, or an image sensor triggers an alarm when a certain visual event happens. Automatic transcoding may be performed from a string of characters to audible information or a search may be performed in a stream of audio or video data. In all these examples, the audiovisual information has been suitably "encoded" to enable a device or a computer code to take some action.
Audiovisual sources will play an increasingly pervasive role in our lives, and there will be a growing need to have these sources processed further. This makes it necessary to develop forms of audiovisual information representation that go beyond the simple waveform or sample-based, compression-based (such as MPEG-1 and MPEG-2) or even objects-based (such as MPEG-4) representations. Forms of representation that allow some degree of interpretation of the information meaning are necessary. These forms can be passed onto, or accessed by, a device or a computer code. In the examples given above an image sensor may produce visual data not in the form of PCM samples (pixels values) but in the form of objects with associated physical measures and time information. These could then be stored and processed to verify if certain programmed conditions are met. A PVR could receive descriptions of the audiovisual information associated to a program that would enable it to record, for example, only news with the exclusion of sport. Products from a company could be described in such a way that a machine could respond to unstructured queries from customers making inquiries.
MPEG-7 is a standard for describing the multimedia content data that will support these operational requirements. The requirements apply, in principle, to both real-time and non real-time as well as push and pull applications. MPEG-7 does not standardize or evaluate applications, although in the development of the MPEG-7 standard applications have been used for understanding the requirements and evaluation of technology. It must be made clear that the requirements are derived from analyzing a wide range of potential applications that could use MPEG-7 tools. MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible.
1.2 MPEG-7 Objectives
In October 1996, MPEG started a new work item to provide a solution to the questions described above. The new member of the MPEG family, named "Multimedia Content Description Interface" (in short MPEG-7), provides standardized core technologies allowing the description of audiovisual data content in multimedia environments. It extends the limited capabilities of proprietary solutions in identifying content that exist today, notably by including more data types.
Audiovisual data content that has MPEG-7 descriptions associated with it, may include: still pictures, graphics, 3D models, audio, speech, video, and composition information about how these elements are combined in a multimedia presentation (scenarios). A special case of these general data types is facial characteristics.
MPEG-7 descriptions do, however, not depend on the ways the described content is coded or stored. It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitized content.
MPEG-7 allows different granularity in its descriptions, offering the possibility to have different levels of discrimination. Even though the MPEG-7 description does not depend on the (coded) representation of the material, MPEG-7 can exploit the advantages provided by MPEG-4 coded content. If the material is encoded using MPEG-4, which provides the means to encode audio-visual material as objects having certain relations in time (synchronization) and space (on the screen for video, or in the room for audio), it will be possible to attach descriptions to elements (objects) within the scene, such as audio and visual objects.
Because the descriptive features must be meaningful in the context of the application, they will be different for different user domains and different applications. This implies that the same material can be described using different types of features, tuned to the area of application. To take the example of visual material: a lower abstraction level would be a description of e.g. shape, size, texture, color, movement (trajectory) and position (Ëœwhere in the scene can the object be found?); and for audio: key, mood, tempo, tempo changes, position in sound space. The highest level would give semantic information: ËœThis is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background.â„¢ Intermediate levels of abstraction may also exist.
The level of abstraction is related to the way the features can be extracted: many low-level features can be extracted in fully automatic ways, whereas high level features need (much) more human interaction.
Next to having a description of what is depicted in the content, it is also required to include other types of information about the multimedia data:
Â¢ The form - An example of the form is the coding format used (e.g. JPEG, MPEG-2), or the overall data size. This information helps determining whether the material can be Ëœreadâ„¢ by the user terminal;
Â¢ Conditions for accessing the material - This includes links to a registry with intellectual property rights information, and price;
Â¢ Classification - This includes parental rating, and content classification into a number of pre-defined categories;
Â¢ Links to other relevant material - The information may help the user speeding up the search;
Â¢ The context - In the case of recorded non-fiction content, it is very important to know the occasion of the recording (e.g. Olympic Games 1996, final of 200 meter hurdles, men).
The main elements of the MPEG-7 standard are:
Â¢ Description Tools: Descriptors (D), that define the syntax and the semantics of each feature (metadata element); and Description Schemes (DS), that specify the structure and semantics of the relationships between their components, that may be both Descriptors and Description Schemes;
Â¢ A Description Definition Language (DDL) to define the syntax of the MPEG-7 Description Tools and to allow the creation of new Description Schemes and, possibly, Descriptors and to allow the extension and modification of existing Description Schemes;
Â¢ System tools, to support binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions, etc.
Therefore, MPEG-7 Description Tools allows to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), to incorporate application specific extensions using the DDL and to deploy the descriptions using System tools.
The MPEG-7 descriptions of content that may include:
Â¢ Information describing the creation and production processes of the content (director, title, short feature movie).
Â¢ Information related to the usage of the content (copyright pointers, usage history, broadcast schedule).
Â¢ Information of the storage features of the content (storage format, encoding).
Â¢ Structural information on spatial, temporal or spatio-temporal components of the content (scene cuts, segmentation in regions, region motion tracking).
Â¢ Information about low level features in the content (colors, textures, sound timbres, melody description).
Â¢ Conceptual information of the reality captured by the content (objects and events, interactions among objects).
Â¢ Information about how to browse the content in an efficient way (summaries, variations, spatial and frequency subbands,).
Â¢ Information about collections of objects.
Â¢ Information about the interaction of the user with the content (user preferences, usage history).
All these descriptions are of course coded in an efficient way for searching, filtering, etc.
To accommodate this variety of complementary content descriptions, MPEG-7 approaches the description of content from several viewpoints. The sets of Description Tools developed on those viewpoints are presented here as separate entities. However, they are interrelated and can be combined in many ways. Depending on the application, some will present and others can be absent or only partly present.
A description generated using MPEG-7 Description Tools will be associated with the content itself, to allow fast and efficient searching for, and filtering of material that is of interest to the user.
MPEG-7 data may be physically located with the associated AV material, in the same data stream or on the same storage system, but the descriptions could also live somewhere else on the globe. When the content and its descriptions are not co-located, mechanisms that link the multimedia material and their MPEG-7 descriptions are needed; these links will have to work in both directions.
MPEG-7 addresses many different applications in many different environments, which means that it needs to provide a flexible and extensible framework for describing audiovisual data. Therefore, MPEG-7 does not define a monolithic system for content description but rather a set of methods and tools for the different viewpoints of the description of audiovisual content. Having this in mind, MPEG-7 is designed to take into account all the viewpoints under consideration by other leading standards such as, among others, TV Anytime, Dublin Core, SMPTE Metadata Dictionary, and EBU P/Meta. These standardization activities are focused to more specific applications or application domains, whilst MPEG-7 has been developed as generic as possible. MPEG-7 uses also XML as the language of choice for the textual representation of content description, as XML Schema has been the base for the DDL (Description Definition Language) that is used for the syntactic definition of MPEG-7 Description Tools and for allowing extensibility of Description Tools (either new MPEG-7 ones or application specific). Considering the popularity of XML, usage of it will facilitate interoperability with other metadata standards in the future.
1.3 Scope of the Standard
MPEG-7 addresses applications that can be stored (on-line or off-line) or streamed (e.g. broadcast, push models on the Internet), and can operate in both real-time and non real-time environments. A Ëœreal-time environmentâ„¢ in this context means that the description is generated while the content is being captured.
Figure 1 below shows a highly abstract block diagram of a possible MPEG-7 processing chain, included here to explain the scope of the MPEG-7 standard. This chain includes feature extraction (analysis), the description itself, and the search engine (application). To fully exploit the possibilities of MPEG-7 descriptions, automatic extraction of features will be extremely useful. It is also clear that automatic extraction is not always possible, however. As was noted above, the higher the level of abstraction, the more difficult automatic extraction is, and interactive extraction tools will be of good use. However useful they are, neither automatic nor semi-automatic feature extraction algorithms are inside the scope of the standard. The main reason is that their standardization is not required to allow interoperability, while leaving space for industry competition. Another reason not to standardize analysis is to allow making good use of the expected improvements in these technical areas.
Also the search engines, filter agents, or any other program that can make use of the description, are not specified within the scope of MPEG-7; again this is not necessary, and here too, competition will produce the best results.
Figure 1: Scope of MPEG-7
Figure 2: MPEG-7 main elements
Figure 2 shows the relationship among the different MPEG-7 elements introduced above. The DDL allows the definition of the MPEG-7 description tools, both Descriptors and Description Schemes, providing the means for structuring the Ds into DSs. The DDL also allows the extension for specific applications of particular DSs. The description tools are instantiated as descriptions in textual format (XML) thanks to the DDL (based on XML Schema). Binary format of descriptions is obtained by means of the BiM defined in the Systems part.
Figure 3: Abstract representation of possible applications using MPEG-7
Figure 3 explains a hypothetical MPEG-7 chain in practice [ There can be other streams from content to user; these are not depicted here. Furthermore, it is understood that the MPEG-7 Coded Description may be textual or binary, as there might be cases where a binary efficient representation of the description is not needed, and a textual representation would suffice.] . From the multimedia content an Audiovisual description is obtained via manual or semi-automatic extraction. The AV description may be stored (as depicted in the figure) or streamed directly. If we consider a pull scenario, client applications will submit queries to the descriptions repository and will receive a set of descriptions matching the query for browsing (just for inspecting the description, for manipulating it, for retrieving the described content, etc.). In a push scenario a filter (e.g., an intelligent agent) will select descriptions from the available ones and perform the programmed actions afterwards (e.g., switching a broadcast channel or recording the described stream). In both scenarios, all the modules may handle descriptions coded in MPEG-7 formats (either textual or binary), but only at the indicated conformance points it is required to be MPEG-7 conformant (as they show the interfaces between an application acting as information server and information consumer).The emphasis of MPEG-7 is the provision of novel solutions for audio-visual content description. Thus, addressing text-only documents was not among the goals of MPEG-7. However, audio-visual content may include or refer to text in addition to its audio-visual information. MPEG-7 therefore has standardized different Description Tools for textual annotation and controlled vocabularies, taking into account existing standards and practices.1.4 MPEG-7 Application Areas
The elements that MPEG-7 standardizes provide support to a broad range of applications (for example, multimedia digital libraries, broadcast media selection, multimedia editing, home entertainment devices, etc.). MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase. The information used for content retrieval may also be used by agents, for the selection and filtering of broadcasted "push" material or for personalized advertising. Additionally, MPEG-7 descriptions will allow fast and cost-effective usage of the underlying data, by enabling semi-automatic multimedia presentation and editing.
All application domains making use of multimedia will benefit from MPEG-7. Considering that at present day it is hard to find one not using multimedia, please extend the list of the examples below using your imagination:
Â¢ Architecture, real estate, and interior design (e.g., searching for ideas).
Â¢ Broadcast media selection (e.g., radio channel, TV channel).
Â¢ Cultural services (history museums, art galleries, etc.).
Â¢ Digital libraries (e.g., image catalogue, musical dictionary, bio-medical imaging catalogues, film, video and radio archives).
Â¢ E-Commerce (e.g., personalized advertising, on-line catalogues, directories of e-shops).
Â¢ Education (e.g., repositories of multimedia courses, multimedia search for support material).
Â¢ Home Entertainment (e.g., systems for the management of personal multimedia collections, including manipulation of content, e.g. home video editing, searching a game, karaoke).
Â¢ Investigation services (e.g., human characteristics recognition, forensics).
Â¢ Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face).
Â¢ Multimedia directory services (e.g. yellow pages, Tourist information, Geographical information systems).
Â¢ Multimedia editing (e.g., personalized electronic news service, media authoring).
Â¢ Remote sensing (e.g., cartography, ecology, natural resources management).
Â¢ Shopping (e.g., searching for clothes that you like).
Â¢ Social (e.g. dating services).
Â¢ Surveillance (e.g., traffic control, surface transportation, non-destructive testing in hostile environments).
The way MPEG-7 descriptions will be used to answer user queries or filtering operations is outside the scope of the standard. The type of content and the query do not have to be the same; for example, visual material may be queried and filtered using visual content, music, speech, etc. It is the responsibility of the search engine and filter agent to match the query data to the MPEG-7 description.
A few query examples are:
Â¢ Play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g. in terms of emotions.
Â¢ Draw a few lines on a screen and find a set of images containing similar graphics, logos, ideograms,...
Â¢ Define objects, including color patches or textures and retrieve examples among which you select the interesting objects to compose your design.
Â¢ On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations.
Â¢ Describe actions and get a list of scenarios containing such actions.
Â¢ Using an excerpt of Pavarottiâ„¢s voice, obtaining a list of Pavarottiâ„¢s records, video clips where Pavarotti is singing and photographic material portraying Pavarotti.
1.5 MPEG-7 parts
The MPEG-7 Standard consists of the following parts:
1. MPEG-7 Systems â€œ the tools needed to prepare MPEG-7 descriptions for efficient transport and storage and the terminal architecture.
2. MPEG-7 Description Definition Language - the language for defining the syntax of the MPEG-7 Description Tools and for defining new Description Schemes.
3. MPEG-7 Visual â€œ the Description Tools dealing with (only) Visual descriptions.
4. MPEG-7 Audio â€œ the Description Tools dealing with (only) Audio descriptions.
5. MPEG-7 Multimedia Description Schemes - the Description Tools dealing with generic features and multimedia descriptions.
6. MPEG-7 Reference Software - a software implementation of relevant parts of the MPEG-7 Standard with normative status.
7. MPEG-7 Conformance Testing - guidelines and procedures for testing conformance of MPEG-7 implementations
8. MPEG-7 Extraction and use of descriptions â€œ informative material (in the form of a Technical Report) about the extraction and use of some of the Description Tools.
9. MPEG-7 Profiles and levels - provides guidelines and standard profiles.
10. MPEG-7 Schema Definition - specifies the schema using the Description Definition Language
Besides the different official MPEG-7 parts, there are also MPEG-7 Liaisons within the broader scope of the MPEG Liaisons activity. MPEG Liaisons deals with organizing formal collaboration between MPEG and other related activities under development in other standardization bodies. Currently MPEG-7 related liaisons include, among others, SMPTE, TV-Anytime, EBU P/Meta, Dublin Core and W3C.
1.6 Method of Work and Development Schedule
The method of development has been comparable to that of the previous MPEG standards. MPEG work is usually carried out in three stages: definition, competition, and collaboration. In the definition phase, the scope, objectives and requirements for MPEG-7 were defined. In the competitive stage, participants worked on their technology by themselves. The end of this stage was marked by the MPEG-7 Evaluation following an open Call for Proposals (CfP). The Call asked for relevant technology fitting the requirements. In answer to the Call, all interested parties, no matter whether they participate or have participated in MPEG, were invited to submit their technology to MPEG. Some 60 parties submitted, in total, almost 400 proposals, after which MPEG made a fair expert comparison between these submissions.
Selected elements of different proposals were incorporated into a common model (the eXperimentation Model, or XM) during the collaborative phase of the standard with the goal of building the best possible model, which was in essence a draft of the standard itself. During the collaborative phase, the XM was updated and improved in an iterative fashion, until MPEG-7 reached the Committee Draft (CD) stage in October 2000, after several versions of the Working Draft. Improvements to the XM were made through Core Experiments (CEs). CEs were defined to test the existing tools against new contributions and proposals, within the framework of the XM, according to well-defined test conditions and criteria. Finally, those parts of the XM (or of the Working Draft) that corresponded to the normative elements of MPEG-7 were standardized.
The main MPEG-7 (version 1) development schedule is shown below, where it is seen that MPEG-7 reached International Standard status in 2001, although as will be shown in the next table, it the first parts werenâ„¢t published by ISO until 2002.:
Call for Proposals October 1998
Evaluation February 1999
First version of Working Draft (WD) December 1999
Committee Draft (CD) October 2000
Final Committee Draft (FCD) February 2001
Final Draft International Standard (FDIS) July 2001
International Standard (IS) September 2001
The following table shows the current status of the MPEG-7 standard. The parts marked is cursive are still in the development process, whilst the others have been published by ISO or will be shortly (the ones where the year is not complete).
Pt Year/Cor/Amd Title Status
1 2002 Systems
Amd 1:200x Systems Extensions FDAM
Amd 2:200x System Extensions: Efficient Random Access WD
2 2002 Description definition language
3 2002 Visual
Amd.1:2004 Visual Descriptors Extensions
Amd.2:200x Visual Descriptors Further Extensions WD
4 2002 Audio
2002/Amd 1:2004 Audio extensions
2002/Amd 2:200x Audio Descriptors Further Extensions WD
5 2003 Multimedia description schemes
2003/Amd 1:2004 MDS extensions
2003/Amd 2:200x Multimedia description schemes user preference extensions FDAM
6 2003 Reference software
Amd 1:200x FDAM
7 2003 Conformance testing
8 2002 Extraction and use of MPEG-7 descriptions
9 200x Profiles and Levels FDIS
10 200x FDIS
1.7 Document structure
This overview document is structured in five sections besides the introduction and several annexes. Each section is divided in several subsections, each of one devoted to the different MPEG-7 parts:
Â¢ section 2 describes the major functionalities,
Â¢ section 3 contains a detailed technical overview of version 1,
Â¢ section 4 describes current work regarding profiling within MPEG-7 and
Â¢ section 5 describes the work beyond version 1, including the current amendments that conform version 2 and work under development towards version 3.
2. Major functionalities in MPEG-7
The following subsections (MPEG-7 part ordered) contain the major functionalities offered by the different parts of the MPEG-7 standard.
2.1 MPEG-7 Systems
MPEG-7 Systems includes currently the binary format for encoding MPEG-7 descriptions and the terminal architecture.
2.2 MPEG-7 Description Definition Language
According to the definition in the MPEG-7 Requirements Document the Description Definition Language (DDL) is:
"... a language that allows the creation of new Description Schemes and, possibly, Descriptors. It also allows the extension and modification of existing Description Schemes."
The DDL is based on XML Schema Language. But because XML Schema Language has not been designed specifically for audiovisual content description, there are certain MPEG-7 extensions which have been added. As a consequence, the DDL can be broken down into the following logical normative components:
Â¢ The XML Schema structural language components;
Â¢ The XML Schema datatype language components;
Â¢ The MPEG-7 specific extensions.
2.3 MPEG-7 Visual
MPEG-7 Visual Description Tools consist of basic structures and Descriptors that cover following basic visual features: color, texture, shape, motion, localization, and face recognition. Each category consists of elementary and sophisticated Descriptors.
2.4 MPEG-7 Audio
MPEG-7 Audio provides structuresâ€in conjunction with the Multimedia Description Schemes part of the standardâ€for describing audio content. Utilizing those structures are a set of low-level Descriptors, for audio features that cut across many applications (e.g., spectral, parametric, and temporal features of a signal), and high-level Description Tools that are more specific to a set of applications. Those high-level tools include general sound recognition and indexing Description Tools, instrumental timbre Description Tools, spoken content Description Tools, an audio signature Description Scheme, and melodic Description Tools to facilitate query-by-humming.
2.5 MPEG-7 Multimedia Description Schemes
MPEG-7 Multimedia Description Schemes (also called MDS) comprises the set of Description Tools (Descriptors and Description Schemes) dealing with generic as well as multimedia entities.
Generic entities are features, which are used in audio and visual descriptions, and therefore "generic" to all media. These are, for instance, "vector", "time", textual description tools, controlled vocabularies, etc.
Apart from this set of generic Description Tools, more complex Description Tools are standardized. They are used whenever more than one medium needs to be described (e.g. audio and video.) These Description Tools can be grouped into 5 different classes according to their functionality:
Â¢ Content description: representation of perceivable information
Â¢ Content management: information about the media features, the creation and the usage of the AV content;
Â¢ Content organization: representation the analysis and classification of several AV contents;
Â¢ Navigation and access: specification of summaries and variations of the AV content;
Â¢ User interaction: description of user preferences and usage history pertaining to the consumption of the multimedia material.
2.6 MPEG-7 Reference Software: the eXperimentation Model
The eXperimentation Model (XM) software is the simulation platform for the MPEG-7 Descriptors (Ds), Description Schemes (DSs), Coding Schemes (CSs), and Description Definition Language (DDL). Besides the normative components, the simulation platform needs also some non-normative components, essentially to execute some procedural code to be executed on the data structures. The data structures and the procedural code together form the applications. The XM applications are divided in two types: the server (extraction) applications and the client (search, filtering and/or transcoding) applications.
2.7 MPEG-7 Conformance
MPEG-7 Conformance includes the guidelines and procedures for testing conformance of MPEG-7 implementations.
2.8 MPEG-7 Extraction and use of descriptions
The MPEG-7 "Extraction and Use of descriptions" Technical Report includes informative material about the extraction and use of some of the Description Tools, both providing additional insight into MPEG-7 Reference Software implementation as well as alternative approaches.
2.9 MPEG-7 Profiles
The MPEG-7 "Profiles and Levels" collects standard profiles and levels for MPEG-7, specified across ISO/IEC 15938 parts. While all parts are potential candidates for profiling, current Profiles concentrate on the Description Definition Language [ISO/IEC 15938-2], Visual [ISO/IEC 15938-3], Audio [ISO/IEC 15938-4], Multimedia Description Schemes [ISO/IEC 15938-5], which are based on the namespace versioning defined in Schema Definition [ISO/IEC 15938-10].
2.10 MPEG-7 Schema Definition
The MPEG-7 "Schema Definition" collects the complete MPEG-7 schemas, collecting them from the different standards, corrigenda and amendments.
3. Detailed technical description of the MPEG-7 Technologies
This section contains a detailed overview of the different MPEG-7 technologies that are currently standardized.
First the MPEG-7 Multimedia Descriptions Schemes are described as the other Description Tools (Visual and Audio ones) are used always wrapped in some MPEG-7 MDS descriptions. Afterwards the Visual and Audio Description Tools are described in detail. Then the DDL is described, paving the ground for describing the MPEG-7 formats, both textual (TeM) and binary (BiM). Then the MPEG-7 terminal architecture is presented, followed by the Reference Software. Finally the MPEG-7 Conformance specification and the Extraction and Use of Descriptions Technical Report are explained.
3.1 MPEG-7 Multimedia Description Schemes
MPEG-7 Multimedia Description Schemes (DSs) are metadata structures for describing and annotating audio-visual (AV) content. The DSs provide a standardized way of describing in XML the important concepts related to AV content description and content management in order to facilitate searching, indexing, filtering, and access. The DSs are defined using the MPEG-7 Description Definition Language (DDL), which is based on the XML Schema Language, and are instantiated as documents or streams. The resulting descriptions can be expressed in a textual form (i.e., human readable XML for editing, searching, filtering) or compressed binary form (i.e., for storage or transmission). In this paper, we provide an overview of the MPEG-7 Multimedia DSs and describe their targeted functionality and use in multimedia applications.
The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual (AV) content by enabling interoperability among devices and applications that deal with AV content description. MPEG-7 describes specific features of AV content as well as information related to AV content management. MPEG-7 descriptions take two possible forms: (1) a textual XML form suitable for editing, searching, and filtering, and (2) a binary form suitable for storage, transmission, and streaming delivery. Overall, the standard specifies four types of normative elements: Descriptors, Description Schemes (DSs), a Description Definition Language (DDL), and coding schemes.
The MPEG-7 Descriptors are designed primarily to describe low-level audio or visual features such as color, texture, motion, audio energy, and so forth, as well as attributes of AV content such as location, time, quality, and so forth. It is expected that most Descriptors for low-level features shall be extracted automatically in applications.
On the other hand, the MPEG-7 DSs are designed primarily to describe higher-level AV features such as regions, segments, objects, events; and other immutable metadata related to creation and production, usage, and so forth. The DSs produce more complex descriptions by integrating together multiple Descriptors and DSs, and by declaring relationships among the description components. In MPEG-7, the DSs are categorized as pertaining to the multimedia, audio, or visual domain. Typically, the multimedia DSs describe content consisting of a combination of audio, visual data, and possibly textual data, whereas, the audio or visual DSs refer specifically to features unique to the audio or visual domain, respectively. In some cases, automatic tools can be used for instantiating the DSs, but in many cases instantiating DSs requires human assisted extraction or authoring tools.
The objective of this section is to provide an overview of the MPEG-7 Multimedia Description Schemes (DSs) being developed as part of the MPEG-7 standard. The structure of the section is as follows: Section 3.1.1 briefly reviews the organization of the MPEG-7 DSs and highlights the most relevant aspects of the different classes of DSs. Then, Sections 3.1.2 to 3.1.6 describe in more detail the specific design and functionalities of the MPEG-7 Multimedia DSs.
3.1.1 Organization of MDS tools
Figure 4 provides an overview of the organization of MPEG-7 Multimedia DSs into the following areas: Basic Elements, Content Description, Content Management, Content Description, Content Organization, Navigation and Access, and User Interaction.
Figure 4: Overview of the MPEG-7 Multimedia DSs
188.8.131.52 Basic Elements
MPEG-7 provides a number of Schema Tools that assist in the formation, packaging, and annotation of MPEG-7 descriptions. An MPEG-7 description begins with a root element that signifies whether the description is complete or partial. A complete description provides a complete, standalone description of AV content for an application. On the other hand, a description unit carries only partial or incremental information that possibly adds to an existing description. In the case of a complete description, an MPEG-7 top-level element follows the root element. The top-level element orients the description around a specific description task, such as the description of a particular type of AV content, for instance an image, video, audio, or multimedia, or a particular function related to content management, such as creation, usage, summarization, and so forth. The top-level types collect together the appropriate tools for carrying out the specific description task. In the case of description units, the root element can be followed by an arbitrary instance of an MPEG-7 DS or Descriptor. Unlike a complete description which usually contains a "semantically-complete" MPEG-7 description, a description unit can be used to send a partial description as required by an application â€œ such as a description of a place, a shape and texture descriptor and so on. The Package DS describes a user-defined organization of MPEG-7 DSs and Ds into a package, which allows the organized selection of MPEG-7 tools to be communicated to a search engine or user. Furthermore, the DescriptionMetadata DS describes metadata about the description, such as creation time, extraction instrument, version, confidence, and so forth.
A number of basic elements are used throughout the MDS specification as fundamental constructs in defining the MPEG-7 DSs. The basic data types provide a set of extended data types and mathematical structures such as vectors and matrices, which are needed by the DSs for describing AV content. The basic elements include also constructs for linking media files, localizing pieces of content, and describing time, places, persons, individuals, groups, organizations, and other textual annotation. We briefly discuss the MPEG-7 approaches for describing time and textual annotations.
Figure 5: Overview of the Time DSs
Temporal Information: the DSs for describing time are based on the ISO 8601 standard, which has also been adopted by the XML Schema language. The Time DS and MediaTime DS describe time information in the real world and in media streams, respectively. Both follow the same strategy described in Figure 5. Figure 5.A illustrates the simplest way to describe a temporal instant and a temporal interval. A time instant, t1, can be described by a lexical representation using the Time Point. An interval, [t1, t2], can be described by its starting point, t1, (using the Time Point) and a Duration, t2 - t1. An alternative way to describe a time instant is shown in Figure 5.B. It relies on Relative Time Point. The instant, t1, is described by a temporal offset with respect to a reference, t0, called Time Base. Note that the goal of the Relative Time Point is to define a temporal instant, t1, and not an interval as the Duration in Figure 5.A. Finally, Figure 5.C illustrates the specification of time using a predefined interval called Time Unit and counting the number of intervals. This specification is particularly efficient for periodic or sampled temporal signals. Since the strategy consists of counting Time Units, the specification of a time instant has to be done relative to a Time Base (or temporal origin). In Figure 5.C, t1 is defined with a Relative Incremental Time Point by counting 8 Time Units (starting from t0). An interval [t1, t2], can also be defined by counting Time Units. In Figure 5.C, Incremental Duration is used to count 13 Time Units to define the interval [t1, t2].
Textual Annotation: text annotation is an important component of many DSs. MPEG-7 provides a number of different basic constructs for textual annotation. The most flexible text annotation construct is the data type for free text. Free text allows the formation of an arbitrary string of text, which optionally includes information about the language of the text. However, MPEG-7 provides also a tool for more structured textual annotation by including specific fields corresponding to the questions "Who? What object? What action? Where? When? Why? and How?". Moreover, more complex textual annotations can also be defined by describing explicitly the syntactic dependency between the grammatical elements forming sentences (for example, relation between a verb and a subject, etc.). This last type of textual annotation is particularly useful for applications where the annotation will be processed automatically. Lastly, MPEG-7 provides constructs for classification schemes and controlled terms. The classification schemes provide a language independent set of terms that form a vocabulary for a particular application or domain. Controlled terms are used in descriptions to make reference to the entries in the classification schemes. Allowing controlled terms to be described by classification schemes offers advantages over the standardization of fixed vocabularies for different applications and domains, since it is likely that the vocabularies for multimedia applications will evolve over time.
184.108.40.206 Content Management
MPEG-7 provides DSs for AV content management. These tools describe the following information: (1) creation and production, (2) media coding, storage and file formats, and (3) content usage. More details about the MPEG-7 tools for content management are described as follows [ Many of the components of the content management DSs are optional. The instantiation of the optional components is often decided in view of the specific multimedia application. ] :The Creation Information describes the creation and classification of the AV content and of other related materials. The Creation information provides a title (which may itself be textual or another piece of AV content), textual annotation, and information such as creators, creation locations, and dates. The classification information describes how the AV material is classified into categories such as genre, subject, purpose, language, and so forth. It provides also review and guidance information such as age classification, parental guidance, and subjective review. Finally, the Related Material information describes whether there exists other AV materials that are related to the content being described.The Media Information describes the storage media such as the format, compression, and coding of the AV content. The Media Information DS identifies the master media, which is the original source from which different instances of the AV content are produced. The instances of the AV content are referred to as Media Profiles, which are versions of the master obtained perhaps by using different encodings, or storage and delivery formats. Each Media Profile is described individually in terms of the encoding parameters, storage media information and location.The Usage Information describes the usage information related to the AV content such as usage rights, usage record, and financial information. The rights information is not explicitly included in the MPEG-7 description, instead, links are provided to the rights holders and other information related to rights management and protection. The Rights DS provides these references in the form of unique identifiers that are under management by external authorities. The underlying strategy is to enable MPEG-7 descriptions to provide access to current rights owner information without dealing with information and negotiation directly. The Usage Record DS and Availability DSs provide information related to the use of the content such as broadcasting, on demand delivery, CD sales, and so forth. Finally, the Financial DS provides information related to the cost of production and the income resulting from content use. The Usage Information is typically dynamic in that it is subject to change during the lifetime of the AV content. Many of the individual DSs for content management are presented in more detail in section 3.1.2. 220.127.116.11 Content Description
MPEG-7 provides DSs for description of the structure and semantics of AV content. The structural tools describe the structure of the AV content in terms of video segments, frames, still and moving regions and audio segments. The semantic tools describe the objects, events, and notions from the real world that are captured by the AV content.
The functionality of each of these classes of DSs is given as follows:
Structural aspects: describes the audio-visual content from the viewpoint of its structure. The Structure DSs are organized around a Segment DS that represents the spatial, temporal or spatio-temporal structure of the audio-visual content. The Segment DS can be organized into a hierarchical structure to produce a Table of Content for accessing or Index for searching the audio-visual content. The Segments can be further described on the basis of perceptual features using MPEG-7 Descriptors for color, texture, shape, motion, audio features, and so forth, and semantic information using Textual Annotations. The MPEG-7 Structure DSs are further discussed in Section 3.1.3.
Conceptual aspects: describes the audio-visual content from the viewpoint of real-world semantics and conceptual notions. The Semantic DSs involve entities such as objects, events, abstract concepts and relationships. The Structure DSs and Semantic DSs are related by a set of links, which allows the audio-visual content to be described on the basis of both content structure and semantics together. The links relate different Semantic concepts to the instances within the audio-visual content described by the Segments. The MPEG-7 Semantic DSs are further discussed in Section 18.104.22.168.
Many of the individual DSs for content description are presented in more detail in section 3.1.3. Most of the MPEG-7 content description and content management DSs are linked together, and in practice, the DSs are included within each other in MPEG-7 descriptions. For example, Usage information, Creation and Production, and Media information can be attached to individual Segments identified in the MPEG-7 description of audio-visual content structure. Depending on the application, some aspects of the audio-visual content description can be emphasized, such as Semantics or Creation description, while others can be minimized or ignored, such Media or Structure description.
22.214.171.124 Navigation and Access
MPEG-7 provides also DSs for facilitating browsing and retrieval of audio-visual content by defining summaries, partitions and decompositions, and variations of the audio-visual material.
Summaries: provide compact summaries of the audio-visual content to enable discovery, browsing, navigation, visualization and sonification of audio-visual content. The Summary DSs involve two types of navigation modes: hierarchical and sequential. In the hierarchical mode, the information is organized into successive levels, each describing the audio-visual content at a different level of detail. In general, the levels closer to the root of the hierarchy provide more coarse summaries, and levels further from the root provide more detailed summaries. The sequential summary provides a sequence of images or video frames, possibly synchronized with audio, which may compose a slide-show or audio-visual skim.
Partitions and Decompositions: describe different decompositions of the audio-visual signals in space, time and frequency. The partitions and decompositions can be used to describe different views of the audio-visual data, which is important for multi-resolution access and progressive retrieval.
Variations: provide information about different variations of audio-visual programs, such as summaries and abstracts; scaled, compressed and low-resolution versions; and versions with different languages and modalities â€œ audio, video, image, text, and so forth. One of the targeted functionalities of the Variation DS is to allow the selection of the most suitable variation of an audio-visual program, which can replace the original, if necessary, to adapt to the different capabilities of terminal devices, network conditions or user preferences.
The Navigation and Access DSs are described in more detail in Section 3.1.4.
126.96.36.199 Content Organization
MPEG-7 provides also DSs for organizing and modeling collections of audio-visual content and of descriptions. The Collection DS organizes collections of audio-visual content, segments, events, and/or objects. This allows each collection to be described as a whole based on the common properties. In particular, different models and statistics may be specified for characterizing the attribute values of the collections. The Content Organization DS are described in more detail in Section 3.1.5.
188.8.131.52 User Interaction
Finally, the last set of MPEG-7 DSs deals with User Interaction. The User Interaction DSs describe user preferences and usage history pertaining to the consumption of the multimedia material. This allows, for example, matching between user preferences and MPEG-7 content descriptions in order to facilitate personalization of audio-visual content access, presentation and consumption. The main features of the User Interaction DSs are described in Section 3.1.6.
3.1.2 Content management
The Content Management Description Tools allow the description of the life cycle of the content, from content to consumption.
The content described by MPEG-7 descriptions can be available in different modalities, formats, Coding Schemes, and there can be several instances. For example, a concert can be recorded in two different modalities: audio and audio-visual. Each of these modalities can be encoded by different Coding Schemes. This creates several media profiles. Finally, several instances of the same encoded content may be available. These concepts of modality, profile and instance are illustrated in Figure 6.
Figure 6. Model for content, profile and instance
Â¢ Content: One reality such as a concert in the world can be represented as several types of media, e.g., audio media, audio-visual media. A content is an entity that has a specific structure to represent the reality.
Â¢ Media Information: Physical format of a content is described by Media Information DS. One description instance of the DS will be attached to one content entity to describe it. The DS is centered about an identifier for the content entity and it also has sets of Descriptors for the storage format of the entity.
Â¢ Media Profile: One content entity can have one or more media profiles that correspond to different Coding Schemes of the entity. One of the profiles is the original one, called master profile, that corresponds to initially created or recorded one. The others will be transcoded from the master. If the content is encoded with the same encoding tool but with different parameters, different media profiles are created.
Â¢ Media Instance: A content entity can be instantiated as physical entities called media instances. An identifier and a locator specify the media instance.
Â¢ CreationInformation: Information about the creation process of a content entity is described by CreationInformation DS. One description instance of the DS will be attached to one content entity to describe it.
Â¢ UsageInformation: Information about the usage of a content entity is described by UsageInformation DS. One description instance of the DS will be attached to one content entity to describe it.
The only part of the description that depends on the storage media or the encoding format is the MediaInformation described in this section. The remaining part of the MPEG-7 description does not depend on the various profiles or instances and, as a result, can be used to describe jointly all possible copies of the content.
184.108.40.206 Media Description Tools
The description of the media involves a single top-level element, the MediaInformation DS. It is composed of an optional MediaIdentification D and one or several MediaProfile Ds
The Media Identification D contains Description Tools that are specific to the identification of the AV content, independently of the different available instances. The different media profiles of the content are described via their Media Profile and for each Profile there can be different media instances available.
The Media Profile D contains the different Description Tools that allow the description of one profile of the media AV content being described. The profile concept refers to the different variations that can be produced from an original or master media depending of on the values chosen for the coding, storage format, etc. The profile corresponding to the original or master copy of the AV content is considered the master media profile. For each profile there can be one or more media instances of the master media profile.
The MediaProfile D is composed of:
Â¢ MediaFormat D: contains Description Tools that are specific to the coding format of the media profile.
Â¢ MediaInstance D: contains the Description Tools that identify and locate the different media instances (copies) available of a media profile.
Â¢ MediaTranscodingHints D: contains Description Tools that specify transcoding hints of the media being described. The purpose of this D is to improve quality and reduce complexity for transcoding applications. The transcoding hints can be used in video transcoding and motion estimation architectures to reduce the computational complexity.
Â¢ MediaQuality D: represents quality rating information of an audio or visual content. It can be used to represent both subjective quality ratings and objective quality ratings.
220.127.116.11 Creation & production Description Tools
The creation and production information Description Tools describe author-generated information about the generation/production process of the AV content. This information cannot usually be extracted from the content itself. This information is related to the material but it is not explicitly depicted in the actual content.
The description of the creation and production information has as top-level element, the CreationInformation DS, which is composed of one Creation D, zero or one Classification D, and zero or several RelatedMaterial Ds.
The Creation D contains the Description Tools related to the creation of the content, including places, dates, actions, materials, staff (technical and artistic) and organizations involved.
The Classification D contains the Description Tools that allow classifying the AV content. The Classification D is used for the description of the classification of the AV content. It allows searching and filtering based on user preferences regarding user-oriented classifications (e.g., language, style, genre, etc.) and service-oriented classifications (e.g., purpose, parental guidance, market segmentation, media review, etc.).
The Related Material D contains the Description Tools related to additional information about the AV content available in other materials.
18.104.22.168 Content usage Description Tools
The content usage information Description Tools describe information about the usage process of the AV content.
The description of the usage information is enabled by the UsageInformation DS, which may include one Rights D, zero or one Financial D, and zero or several Availability Ds and UsageRecord Ds.
It is important to note that the UsageInformation DS description may incorporate new descriptions each time the content is used (e.g., UsageRecord DS, Income in Financial datatype), or when there are new ways to access to the content (e.g., Availability D).
The Rights datatype gives access to the information to the rights holders of the annotated content (IPR) and the Access Rights.
The Financial datatype contains information related to the costs generated and income produced by AV content. The notions of partial costs and incomes allows the classification of different costs and incomes as a function of their type. Total and subtotal costs and incomes are to be calculated by the application from these partial values.
The Availability DS contains the Description Tools related to the availability for use of the content.
The UsageRecord DS contains the Description Tools related to the past use of the content.
3.1.3 Content Description
22.214.171.124 Description of the content structural aspects
The core element of this part of the description is the Segment DS. It addresses the description of the physical and logical aspects of audio-visual content. Segment DSs may be used to form segment trees. MPEG-7 also specifies a Graph DS that allows the representation of complex relations between segments. It is used to describe spatio-temporal relationships, between segments that are not described by the tree structures.
A segment represents a section of an audio-visual content item. The Segment DS is an abstract class (in the sense of object-oriented programming). It has nine major subclasses: Multimedia Segment DS, AudioVisual Region DS, AudioVisual Segment DS, Audio Segment DS, Still Region DS, Still Region 3D DS, Moving Region DS, Video Segment DS and Ink Segment DS. Therefore, it may have both spatial and temporal properties. A temporal segment may be a set of samples in an audio sequence, represented by an Audio Segment DS, a set of frames in a video sequence, represented by a Video Segment DS or a combination of both audio and visual information described by an Audio Visual Segment DS. A spatial segment may be a region in an image or a frame in a video sequence, represented by a Still Region DS for 2D regions and a Still Region 3D DS for 3D regions. A spatio-temporal segment may correspond to a moving region in a video sequence, represented by a Moving Region DS or a more complex combination of visual and audio content for example represented by an AudioVisual Region DS. The InkSegment DS describes a temporal interval or segment of electronic ink data, which corresponds to a set of content ink strokes and/or meta ink strokes. Finally, the most generic segment is the Multimedia Segment DS that describes a composite of segments that form a multimedia presentation. The Segment DS is abstract and cannot be instantiated on its own: it is used to define the common properties of its subclasses. Any segment may be described by creation information, usage information, media information and textual annotation. Moreover, a segment can be decomposed into sub-segments through the Segment Decomposition DS.
The Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. The Segment DS can describe a recursive or hierarchical decomposition of the AV content into segments that form a segment tree. The SegmentRelation DS describes additional spatio-temporal relationships among segments.
The Segment DS forms the base abstract type of the different specialized segment types: audio segments, video segments, audio-visual segments, moving regions, and still regions. As a result, a segment may have spatial and/or temporal properties. For example, the AudioSegment DS can describe a temporal audio segment corresponding to a temporal period of an audio sequence. The VideoSegment DS can describe a set of frames of a video sequence. The AudioVisualSegment DS can describe a combination of audio and visual information such as a video with synchronized audio. The StillRegion DS can describe a spatial segment or region of an image or a frame in a video. Finally, the MovingRegion DS can describe a spatio-temporal segment or moving region of a video sequence.
There exists also a set of specialized segments for specific type of AV content. For example, the Mosaic DS is a specialized type of StillRegion. It describes a mosaic or panoramic view of a video segment constructed by aligning together and warping the frames of a VideoSegment upon each other using a common spatial reference system. The VideoText and the InkSegment DSs are two subclasses of the MovingRegion DS. The VideoText DS describes a region of video content corresponding to text or captions. This includes superimposed text as well as text appearing in scene as well as. The InkSegment DS describes a segment of an electronic ink document created by a pen-based system or an electronic whiteboard.
Since the Segment DS is abstract, it cannot be instantiated on its own. However, the Segment DS contains elements and attributes that are common to the different segment types. Among the common properties of segments is information related to creation, usage, media location, and text annotation.
The Segment DS can be used to describe segments that are not necessarily connected, but composed of several non-connected components. Connectivity refers here to both spatial and temporal domains. A temporal segment (Video Segment, Audio Segment and AudioVisual Segment) is said to be temporally connected if it is a sequence of continuous video frames or audio samples. A spatial segment (Still Region) is said spatially connected if it is a group of connected pixels. A spatio-temporal segment (Moving Region) is said spatially and temporally connected if the temporal segment where it is instantiated is temporally connected and if each one of its temporal instantiations in frames is spatially connected (Note that this is not the classical connectivity in a 3D space).
Figure 7: Examples of segments: a) and b) segments composed of one single connected component; c) and d) segments composed of three connected components
Figure 8: Examples of connected a) and non-connected b) moving region
Figure 7 illustrates several examples of temporal or spatial segments and their connectivity. Figure 7.a) and b) illustrate a temporal and a spatial segment composed of a single connected component. Figur
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Active In SP
Joined: Apr 2010
01-05-2010, 07:54 PM
topicideashow-to-mpeg7--8420 for getting all information about mpeg7 or Moving Picture Experts Group 7
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Active In SP
Joined: Jun 2010
24-11-2010, 01:56 PM
COMPRESSED DATA FROM MPEG FILES.ppt (Size: 158.5 KB / Downloads: 102)
This article is presented by:Malaya Kumar Das
A VIDEO RETRIEVAL SYSTEM BASED ON COMPRESSED DATA FROM MPEG FILES
WHAT IT DOES?
The system uses a two-stage indexing and retrieval approach :
first to classify video shots into separate databases using their motion characteristics.
second to search each database for colour features.
The two-stage approach reduces the search time to almost half at a slight reduction in retrieval rate.
COMPONENTS IN VIRS
Colour Feature Representation
Motion Feature Representation
VIDEO DATABASE POPULATION
Four video categories (or types): Sport, News, Drama, and Scenery.
Colour is quantised into 10 colors.
In VIRS each r-frame is partitioned into nine equal rectangular regions: N, NW, W, SW, S, SE, E, NE and Center.
VIDEO DATABASE QUERY
The graphical query interface allows a user to make a query in two steps:
a type query.
a colour sketch query.
The user specifies the type of video shot he or she wishes to retrieve.
The user then enters the colour-based information about the r-frame to be searched from the database of the selected video type.
VIDEO DATABASE QUERY
The graphical query interface consists of:
A visual colour chart of 10 quantised colors with known RGB vectors.
A visual shape chart of common shapes.
A visual texture chart of common textures with known feature vectors.
A menu of video type buttons.
This two-step retrieval strategy reduces the search time to half on average at a slight reduction in retrieval rate. The system returns a ranked list of three or more candidates. The user can click on the returned r-frame to play the retrieved video shot and hence, the video clip.