Tuesday, July 08, 2008

HUMAINE WEB? User Friendly WWW? web 3.0 SEMANTIC WEB!!!

Semantic Web: A Web Beyond Keywords

The first version gave information at your fingertips. The second allowed you to interact with that information. It's time for the third phase . an intelligent Web

Mastufa Ahmed

Saturday, July 05, 2008

The World Wide Web enters into its next phase called Semantic Web bringing in a new paradigm called Web 3.0. The term Semantic Web was coined by Tim Berners-Lee, the man who invented the (first) World Wide Web. In a Semantic Web, machines can read and interpret web pages just like humans. Today, we can link a Web page to another but we can't link their data together. As a result, we browse through the links and then look for the right data within those links. Even when you use a search engine, you enter key words and get a set of links to websites where related information is available. They don't give you the answer to your specific query, i.e. they don't throw up the data, just the links. Social Networking sites these days are trying to improve upon this with the system of tagging. The Semantic web goes beyond the keywords and into natural language processing. So instead of typing in keywords, you can type in your complete question, and the Symantec web will try to find the answer.

So, Semantic Web refers to the technology of precise vocabularies. Though such kind of natural language processing has been in progress for years, it's only recently that it's started to take off. Some start-ups like powerset, textdigger and hakia are working on semantic search engines. A Semantic Web agent does not necessarily include artificial intelligence. Instead it relies on structured sets of information and inference rules that allow it to understand the relationship between data sources. A computer may not understand information the way humans can, but it has enough information to create logical connections and take decisions accordingly. The data itself becomes a part of the Web in case of Semantic Web -unlike the World Wide Web, which has endless information in the form of documents - and is processed irrespective of platform, application or domain. We can search for documents on the World Wide Web, but their interpretation is left for the humans to do. On the other hand, Semantic Web is all about data as well as documents on the Web so that machines can process and even act on the data in practical ways. So while in the Non-semantic Web (Web 1.0 and Web 2.0), we'll term the word 'snake' as snake. However, in the Semantic web (part of Web 3.0), it would be treated as

Let's take another example. A Semantic Search Engine can answer questions like 'Which Indian author won Booker prize in the year 1997?' It will apply the reasoning based on the fact that that the Web knows the difference between the names of Indian Booker winners, respective years and even the names of books.
If we search for the keywords .Semantic Web. in Google, it shows all sites containing information about it. However, in a Semantic Web search such as the one provided by Powerset, you get the definition of 'Semantic Web' along with relevant links

So the emphasis in Semantic Web goes to the back end. A Semantic Web therefore is a Web of relations between resources signifying real world objects such as, people, places and events. It is an extension of the current Web. There is a rich set of links from the Semantic Web to HTML documents. These relations characteristically unite a concept in the Semantic Web with the pages that are most relevant.

Another significant aspect of the Semantic Web is that multiple sites may contribute data about a particular resource. Without requiring any permission from any authority, all relevant data from various sites can extend the cumulative knowledge on the Semantic Web. This distributed extensibility is one of the most important aspects of the Semantic Web.

Powerset gives you the direct result of a question. While a typical search engine like Google gives a list of sites which may not have sufficient information on the topic

The World Wide Web being the biggest repository of information with growing content and arena of knowledge may create a problem as far as its non-semantic nature is concerned. In the future, it would be extremely difficult to make sense of this content. A search engine might help you find content containing specific words or keywords, which may not be relevant to what you are looking for. So what is lacking is that search is based on contents of pages and not on the semantic meaning of the page's contents. On the other hand Semantic Web tags all content on the Web and gives you results with relevant and precise information.

Semantic Websites

Semantic Technology has started to take off with the recent launch of some Websites like powerset, textdigger and hakia. Let's take the case of powerset.com. This site took up the challenging task of applying natural language processing to search. Powerset's first product is a search for Wikipedia, and was launched in May '08. Powerset allows you to enter keywords, phrases or even questions directly. Instead of giving you a list of sites, powerset in most cases answers questions directly. The difference between Powerset and a traditional search engine like Yahoo! And Google is that the latter don't take into account stopwords like after, by, the, etc. Powerset being semantically capable takes into account all such stopwords and gives you the most relevant results. A search of 'Noam Chomsky' on powerset gives you the direct result . a concise bio in the left side with details of Chomsky in the right side, which you wouldn't have in any typical search engine like Yahoo!

Technologies behind Semantic Web

Following components comprise the technology behind Semantic Web.

1. A global naming scheme with URIs: URI (Uniform Resource Identifier) is simply a Web identifier, like the strings starting with http or ftp that we see on the World Wide Web. Anyone can create a URI. URI forms the base technology on top of which to build a Web. Anything that has a URI is considered to be on the Web. For instance, http://www.pcquest.com is an URI that identifies a resource (PCQuest's home page) and signifies that a representation of that resource (home page's HTML code) can be reached through HTTP from a network host called www.pcquest.com. Every data object and every data schema/model in the Semantic Web must have a unique URI.

2. Resource Description Framework:: Also known as RDF, this is a standard syntax for describing data. RDF is an XML-based specification to describe resources on the Web, intranets and extranets. RDF gives a reliable, consistent way to describe and query Internet resources, from text pages to audio files and video clips. It offers syntactic interoperability, and provides the base layer for building a Semantic Web. RDF defines a directed graph of relationships.

3. RDF Schema: This is a standard means to describe properties of data. The semantic extension of RDF is RDF Schema that represents mechanisms to explain groups of related resources and the relationships between them. The class and property system of RDF Schema is akin to the type systems of object-oriented programming languages such as Java. Both RDF Schema and RDF are based on XML and XML Schema. The existence of standards for describing data (RDF) and data attributes (RDF Schema) allows the development of a set of available tools to read and exploit data from multiple sources.

In 'Pandorabots' you can interchange information and ask questions. Bot uses AIML to come out with the most relevant answer

4. Ontologies (that use OWL -Ontology Working Language): Syntactic interoperability is required before multiple applications identify data and take it as information. Syntactic interoperability refers to correct parsing of data. It requires mapping between terms, which needs content analysis. This content analysis again calls for proper and explicit qualifications of domain models, which define the used terms and their relationships. Such formal domain models are sometimes called Ontologies. Ontologies define data models in terms of classes, subclasses, and properties. Web Ontology Language adds more vocabulary to define properties and classes than RDF or RDF Schema. It can describe relations between classes, cardinality (for example, 'exactly two'), equality, richer typing of properties, and characteristics of properties (such as symmetry). OWL has three sublanguages: in order of decreasing expressiveness, they are OWL Full, OWL DL, and OWL Lite.

Examples of ontologies include catalogs for online shopping sites like Amazon.com, domain-specific standard terminology like UNSPSC (a terminology used for products and services), or various taxonomies on the Web, like the 'My Yahoo' categories. Components of OWL Web Ontology Language are Classes, Properties and Individuals.


The basic building blocks of an OWL ontology involve Classes. Classes typically represent a taxonomic hierarchy (a subclass-super class hierarchy). OWL supports six main ways to define classes; named class is the simplest among them. Other types include intersection classes, union classes, complement classes, restrictions, and enumerated classes.


Properties have two main categories; Object properties, which relate individuals to other individuals and Datatype properties, which connect individuals to datatype values, such as integers, floats, and strings.Owl makes use of XML Schema for defining datatypes.


Individuals are example of classes. You may describe, e.g. an individual named John as an instance of the class Person, and use the property as employer to relate John to the individual Cyber Media, signifying that John is an employee of Cyber Media.

ALICE, AIML and Chat Bot

AIML refers to Artificial Intelligence Markup Language which is an XML dialect for creating natural language software agents. ALICE is a popular chat bot short for Artificial Linguistic Internet Computer Entity which was developed in the late 1990s by Dr. Richard Wallace. It intended to connect human and computer interaction. A bot (short for "robot") is a program that works as an agent for a user. On the Internet, the most ubiquitous bots are the programs, also called spiders or crawlers that access Web sites and bring content for search engine indexes. One of the first and most famous chatterbots (prior to the Web) was Eliza, a program that pretended to be a psychotherapist and answered questions with other questions. ALICE uses AIML to respond back to any question. We can chat with any AIML-based bot on any topics and ask questions on anything. All such chat bots are semantically capable. AIML describes a class of data objects called AIML objects. AIML objects are made up of units called topics and categories, which contain parsed or unparsed data. AIML supports two ways to interface languages - the tag which executes any program accessible as an operating system and inserts the results in the reply and the tag allows arbitrary scripting inside the templates.

Future of Semantic Web

Implementation of OWL, RDF or the Semantic Web as a whole would be a continuing process. But will Semantic Web benefit businesses and individuals is what creates a confusion among the technology experts. Considering the way WWW technologies proliferated, it's plausible that this new Web version will make its own capabilities realized one day. However, it might initially be restricted to intranet and extranet applications until security questions are addressed adequately. Keeping in mind the potential of new Web . which is more about real data than the anchored texts or pages, it is hoped that Semantic Web will lead to evolution of human knowledge allowing people to synergize huge amounts of data in a dynamic and relevant way. Large IT companies are awaiting the consensus of the development community in settling down on standards is a barrier to adoption. Another issue that is doing its rounds is the fact that, in a world where anyone can publish anything, there is a question of reliability. We will likely be whimsical of exercising our intelligence as far the nature of Semantic Web is concerned. We will relatively be unintelligent being at the disposal of the future Web which is destined to provide us with information relevant to our search. These are the issues to be addressed before a semantic culture takes off in true sense


Bookmark and Share
posted by u2r2h at Tuesday, July 08, 2008


Post a Comment

<< Home