Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are
1. Using non-image correlations such as the image file names and the texts in the vicinity of the images to guess what are the images all about;
2. Using low-level features, such as colors, textures and primary shapes, of image to make content-based indexing/retrievals.
For the first kind of image search engine, it is very efficient to search objects/scenes with very precise, non-ambiguity and unique text descriptions such as “Time Square” and “Golden Gate Bridge”. However, even in this case, there are still many problems. First, since these kinds of objects/scenes are usually well-known and were documented by thousands of images/videos, one might need to narrow down the search results to more specific subset of the general search results; say, one might be more interest in the search text “Time Square in rain”, “Time Square three yellow cabs”, and “Golden Gate Bridge and one man”, etc. Since this kind of search engine is in fact using the relevant information in texts/titles to guess the contents in the images, it is entirely blind to what is really in an image. The more specific the information the user want, the worse experience the user suffer.
Another scenario to make this kind of text-based image search engine low efficient is the ambiguities and vagueness in the texts around the images. For example, when we talk about “White House”, it can have very many meanings to us. “White House” can be the building of the “White House”. It can also be some events happened near the “White House”. Or, it can even be any house that white. Try search “White House” in Google Image you will know what are these all about.
The problem with the second kind of image search engine is that while low-level image features are important to describe images, they fail to represent high-level semantic and cognitive features of images because they are only the basic components to build cognitive features. This problem can be easily understood by take a brief look at the mainstream porn-detection software available to the market. These porn-detection software packages only inspect skin-tone regions in images and may misclassify many innocent images as shown in the survey of porn-detection software packages.
Another problem in the second kind of image search engine is that it is lack of scale-up ability. With the growth of the number of images and the number of categories of the image base, the classifiers in this kind of image search engine can be easily overwhelmed by the inter-class mix-up and the intra-class diversities.
Just take a look at the embarrassing progresses that we made in the face recognition software, we can easily understand how serious the scale-up issues should be. While many face recognition software claimed to be over 99% accurate when recognize the fixed database, none of then can be really applied to recognized suspects from the real-time video streams in an airport.
Why does this happen? The secrete is that while a database might contain 1 million face samples, it is still too few comparing a video stream generating 20 images per second. Just imagine an Internet with thousands of webcams and millions of digital cameras, scanners, cellular phone cams and digital video cams, the scale-up ability is the first problem to be solved by any image search engines.
Then what is the ultimate solution to build a smart image search engine? The answer is to build real image recognizers for all objects in the world piece by piece based on hierarchical structures mimicking the cognitive image understanding abilities of human brains. There are two levels of structures to be addressed.
1. At the lower level of these hierarchical structures we must build a set of feature detectors that capable of recognizing all low level feature such as: mouths, eyes, faces, trees, poles, light bulbs, mugs, tables, panda, tiger, President Washington, the Time Square and the Wall Street, etc. Yes, it is whole lots of jobs and it seems to be a mission-impossible based on the mainstream image understanding technology because the forbidden amount man powers and computing resources that will grow exponentially with respect to the number of recognizers.
Therefore, the first problem of building a smart image search engine is to find a way to build a bank of image feature recognizers that have linear demands to man-power and computing resources with respect to the number of recognizers. This is known as the scale-up challenge.
2. At the higher level, we need to make to the “layout”, the “meaning”, and the “intuition” behind the image. As a human being looks at an image, the “meaning” of the image is a much more important aspect s/he is looking for. In a word, we look at everything and then focus on the most interesting portion of an image and try to see it. The cognitive features of images play the most important role in the understanding of images. This is the level at which people search images. We peoples search image using cognitive features rather than signal features. Since cognitive features are coded by using natural language and the signal features are coded in data, to search images indexed by using cognitive features are much more efficient and accurate than to search images indexed by signal feature.
What are cognitive features of images? From the computational cognition point of view, a cognitive feature of images is a feature that can be described by using computational nouns and computational verbs, which are two indispensable components in Physical Linguistics. Unlike many low-level feature based image search engines where images are viewed under context-free assumptions, PicSeer views each picture under context-rich scenarios. For example, when PicSeer looks at a picture of a person, it doesn’t look only the clues of colors and textures as the other image search engines do. Instead, PicSeer looks for eyes, face, hands, legs, hair, clothes, facial expressions, gestures and background. PicSeer uses its Physical Linguistic Modeling Engine to organize the layout of the picture, to arrange the relations between different cognitive features in the image and provides the cognitive model for the entire image. In a word, PicSeer translates any image of interest into a story coded by a pseudo-natural language.
For example, PicSeer can translate the following picture into a story “A boy smiles”.
How can PicSeer have this kind of understanding towards images? The Physical Linguistic Vision Technologies have can represent cognitive features into nouns and verbs called computational nouns and computational verbs, respectively. In this case, the image of the boy is represented as a computational noun “boy” and the facial expression of the boy is represented by a computational verb “smile”. All these steps are done by the computer itself automatically.
Without using the high-level cognitive features, an image search engine can still play many tricks to make the contents out of an image. For example, with the assumption that one must put images, which are closely related to the texts, on a webpage in mind, Google categorizes images from a webpage based on all related texts such as file names, webpage title, and more, near images. However, the searching results can be entirely surprising! The followings are some examples to test the technologies behind Google.
On November 12, 2005 Google was inquired by using key word “boy smiles” and the following is the first page of the searching results. The third thumb nail in the first row is a surprise because there is neither boy nor smile. This fact shows that Google doesn’t know neither the cognitive features of boy nor the cognitive features of a smile.
On November 12, 2005 Google was inquire by using key word “boy smile” and the following is the first page of the searching results. Comparing the previous result we have the following conclusions:
1. Google don’t take care the meanings behind the inquire terms. To Google, “boy smile” and “boy smiles” are entirely different searching criteria. This is, of course, cognitively incorrect.
2. The image features used by Google has no cognitive significance.
On November 12, 2005 Google was inquire by using key word “boy smiled” and the following is the first page of the searching results. Confused? Yes, computers did their jobs well, but the results were not quite what should be in our minds.
Other mainstream commercial image search engines have similar performance as shown in above because the principles behind them are quite the same. The failure of these image search engines is caused by the low-level features of images they are using and the inconsistence and randomness in the relations between the images and the texts surround them.
Precision Image Search::Cognitive & Semantic Image Search Engine
††± Click to Download PicSeerDemo Package (48MB) or unzip this package to C:\PicSeerDemo
††± Click to Download Manu for PicSeerDemo PDF
††± Click to Download White Papers for PicSeerDemo(under construction) PDF
††± Applications of PicSeer: automatic video annotations, security, event detection, ITS, etc.
††± A Light-Weighted Key Image Search SDK for Embedded Systems such as PDAs and Cellular Phones.
††± The Growing-up History of PicSeer.
The image search engine, PicSeer, developed in Yang's Scientific Research Institute, LLC., USA. (Yang's), is operated at a semantic level by using Yang's unique Physical Linguistic Vision Technologies. Unlike many existing image search engines where only low level image features such as color and texture features, and primary shape features, are used, PicSeer uses cognitive features of images to build search index. PicSeer leads the paradigm shift of commercial image search engines and already found applications in many image-to-story type applications such as fire detection and vehicle recognition for Intelligent Traffic Systems(ITS). (Download the demo version of PicSeer at [here]. If you failed to install it, you can also choose to unzip this file to C:\PicSeerDemo. )
(November 12, 2005, Tucson, Arizona, USA.)
Once Over Lightly
|PicSeer is a smart image search engine that can search into pictures. Take a look at the following search result for search text “people panda” you can see that PicSeer can understand what the user really wanted was images where people and panda are both appearing.|
|With a very small image base with less than 2 million images, there are a few images where people and panda appear. From the enlarged picture of the first three query results one can see that
1. PicSeer does put high scores to images where people and panda both appear.
2. The pandas, which are either real or artificial and with different poses can be detected by PicSeer.
3. The third picture reveal and interesting result where a woman wear a cloth on which an image of panda was printed in pink ink. This show the ability of PicSeer can detect Panda and people in any colors.
|You can also put the position, color, size and many other descriptions into your search text, for example, if you want to search image with people and panda and some then relatively to the left, the search text can be “person panda left” and the query result is as follow. Observe that this time a new result returned and took the fifth place showing a girl feeding panda puppies. The 1 to 4 places are the same because there are people to the left of the images.|
|What happen if I only search “panda”? Well, the pictures containing only panda will take the first places as shown in the following result when the search text “panda” was used.|
|Armed with sophisticated algorithms, PicSeer is an ever-growing program. Day-by-day, it keeps gathering images from the Internet and learns its image-detecting skills from these images and posts the detecting results into a central database. Therefore, one can expect PicSeer becoming smarter and smarter over time. Don’t believe? Let us keep some brief history of PicSeer to see what kind of skill it mastered when it was growing up.|
|On 2/19/2006, PicSeer can find the Golden Gate Bridge and pick out people who took picture in front of the Golden Gate Bridge. The following is for the search string “golden gate bridge people”.|
|PicSeer can also tell the weather condition around Golden Gate Bridge. The following is the search result of “golden gate bridge foggy”.|
|Scheduled tasks: sunflower, FBI seal.|
|Want to see what PicSeer can get for the images that you are mostly interested in? Welcome to send in your search string to picseer(at)yangsky(DoT)us PicSeer will make its priority to search the images that most people interested in.|
Understanding-based Image Search Engine: Results
To search images by using PicSeer is as simple as to type you stories in any text editing software. PicSeer provides a freeform text search interface and it is very user-friendly, flexible and yet accurate. The followings are some screenshots taken from the search results of the demo version of PicSeer called PicSeerDemo that can be downloaded from the aforementioned link. To find more detailed description of the technologies behind PicSeer, please click here.
|(Note: PicSeer is a real-time growing-up system with the growth of the image base and the computing resources in Yang’s. Therefore, this page is under constant revision to reflect the latest achievements of PicSeer. The different appearances in the screenshots is because they are from different versions of PicSeer.)|
|Case 1: When we enter search text "woman" we get the following search result. Observe that PicSeer finds all pictures where human females appear. This option is for a wide range search of all human females. To refine the search result one can use more specific search texts that will be demonstrated late. (220.127.116.11)|
|Case 2: When we enter the search text "woman big center" we get the following search result. Observe that PicSeer located only those results where woman faces were big enough and centered in the images. This option is ideal for searching ID-photo like images. (18.104.22.168)|
|Case 3: When enter search text "woman small" we get the following search result. Observe that PicSeer found those results where woman faces were small in the images. This option is ideal for searching full-body photos of human females. (22.214.171.124)|
|Warning: for scientific research purposes the following two screenshots contain explicit pictures that might not be completely blocked by the software itself. If you are not allowed or not want to view the minimum amount of explicit contents, please click here to skip the explicit contents and continue to read the survey of the state-of-the-art commercial image search engines. click here to continue.|
|Case END-1: Smart Porn-Blocking Modules--Breast Detectors. PicSeer has a smart porn-detector built in and can target the exact regions that constitute to offensive contents. For example, if we input the following search string "porn detection on + breast big", we get the following search results where almost all regions of breasts are blocked by solid green rectangle. Observe that the green regions were added by PicSeer itself automatically. (126.96.36.199)|
|Case END: Smart Porn-Blocking Modules--P**sy Detectors. The porn-detector used by PicSeer is not only smart, but also covers a wide spectrum. For example, if we use the following search string: "porn detection on + pussy white" we get the following searching result. Observe that almost all regions containing p**sy had been blocked by green rectangles. Again, all green blocking regions were added by PicSeer automatically. (188.8.131.52)|
|click here to continue|
Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are