Week 4

 This was a very full week with three chapters covered, one of which was the longest in the book. We covered the important topics of: navigation systems, search systems, and controlled vocabulary, thesauri, and metadata. Read on for more IA fun!

 Information Arch. Ch. 7 ,8 & 9:

IA Chapter 7: “Navigation Systems”-Navigation systems help us from getting lost, tell us where we are, where we have been and where we can go. The browsers used to access the web provide a lot of navigation tools such as back and forward buttons, open URL (enter URL), history (URLs visited during session), and color coded hypertext links. It is a big mistake to design a website which disables the back button option or bookmark option.


Context-it is important to provide context to the greater whole of the website. Sometimes users can follow a search engine link and be dumped in the middle of a website; they need to know where they are within the website. The organization’s name, logo, and graphics should be used consistently on all pages so a user always knows what site they are on. The navigation system should also show the site hierarchy and indicate where the user is. To test this you can go to a random page within the website and see if you can tell what section you are in, what the parent page is and where the page will lead next. The links should be descriptive enough to provide this information.


Global navigation systems-also known as site-wide navigation are often across the top of a web page. There is usually a link to a home page and sometimes access to the search feature. More complex sites like Apple (www.apple.com) and Amazon (www.amazon.com) provide information on the structure of the site and clues to the user’s location within the site.


Local navigation systems-these combine with global navigation systems to allow the user access to subsections. Some sites such as the New York Times (http://www.nytimes.com/) combine global and local navigation in the same framework by having an expandable navigation bar. Other sites like Microsoft.com (http://www.microsoft.com) have several different local navigation systems.


Contextual navigation-these options are used when the structured categories of global and local navigation don’t work. They are often page specific such as the “see also” links of commerce sites (which link to products that may go along with the item being viewed). An educational or library site may have links to additional articles on a similar subject to the article being viewed. These are often inline hypertext links.


Embedded navigation-navigation options are designed to provide user flexibility but one must be careful not to overwhelm the user with too many choices. Often global, local, and contextual navigation exist together on a page. If they are designed separately they can take up a lot of screen space. They are most effective when designed together so they complement each other. The standard placement is a global navigation bar across the top with a local navigation bar on the left side.



Global navigation



Content navigation





Content navigation





Supplemental navigation systems-these include indexes, guides, site maps, and the ever important search option. No matter how well designed the global and local navigation systems are or how great a site’s taxonomy is, these navigation systems will always fail a certain number of users. These are the back-up systems to help users find what they need:


  • Site maps-show top few layers in a site’s hierarchy. This gives the user a wide view of the site’s content and allows random access to segmented sections. The goal is to provide fast, direct access to content for those who are lost without overwhelming them with too much.
  • Site indexes-these are similar to the index in the back of a book. Keywords, subjects, and phrases are listed alphabetically with no regard to hierarchy. A big challenge of preparing an index for a website is deciding the level at which you index-page level, paragraph level, concepts, collections of web pages, etc. This is where it helps to find out what terms your users are looking for.
  • Guides-these can be guided tours, tutorials, or specific help on a certain task. Usually they are linear in nature with screen shots to help new users along. They should be short and designed to answer questions. Users should be able to exit at any point.
  • Wizards & configurators-these help users configure products or navigate through many complex decisions. Mini cooper’s website (http://www.miniusa.com) uses a wizard to allow users to design their own mini coopers.
  • Search-this is a very important part of the supplemental navigation system. It lets users put in their own keywords to find information. It allows for specific phrase searches which won’t appear in indexes or site maps. However, language ambiguity can cause problems with searches.


Advanced navigation:


  • Personalization vs. customization-with personalization information presented is tailored to a user based on their models of behavior or preferences (we guess what the users want). Customization means giving the user control of the presentation, navigation options, and content of a site (the user tells us what they want). Amazon (http://www.amazon.com) uses personalization to remember names of users and recommend books. Yahoo! (http://www.yahoo.com) allows users to customize their Yahoo portal site.
  • Visualization-various experiments have been done to use visualization and graphics for navigation. Virtual museums were designed with thoughts on how physical museums look and operate. Grokker is a search product by Groxis which allows the user to have a visual navigation experience. Right now most are still experiments.
  • Social navigation-this type of navigation was designed on the theory that users value what other users are doing. Lists of popular resources are based on this type of idea. Amazon (http://www.amazon.com) provides lists of books other customers bought when they bought a book you are viewing. Flickr (http://www.flickr.com) shows their most popular tags in larger fonts.



IA Chapter 8: “Search systems”-not all sites need a search option, and search should not take the place of the other types of navigation systems. One must consider if the site has enough content to warrant the need of a search option. It also depends on the type of website-one that is library like (stores lots of individual pieces of information) may benefit by having a search system. One must also consider the time and knowledge needed to make sure a search system is working correctly; they are very difficult to make effective. An alternative to search is a site index which will help users who know what they are looking for and the site index is not as difficult to maintain and manage.


Reasons to implement a search system:


  • Too much information-if your website has out grown its browsing systems, then search can help. When category pages are too long for users to find what they need it may be time to implement search.
  • Fragmented site-many large intranets and public sites are made up of many sub groups of content that have become incorporated and don’t all have the same metadata or indexing levels for good browsing. Search can help users find information even in different units but it won’t fix everything.
  • Learning tool-by accessing the search log you can run search analytics and get data on your users and find out what they are looking for on your site.
  • Users expect it-if you have a large site it probably needs a search option since users won’t want to go through all the site structure to find what they are looking for quickly. Nowadays, many users expect a search box to be available and use it because they are used to using search engines like Google.
  • Tame dynamism-if your site changes rapidly (daily newspaper sites may add many stories every day) there may not be the time to manually catalog or maintain a very detailed table of contents. Search can help with automatic indexing and this gives users better access to the site’s information.


Anatomy of a search:


  1. User query
  2. Search interface-(query language, query builders) a simple box or more advanced? Query language-are Boolean operators allowed? Query builders-is there a spell checker?
  3. Search engine
  4. Content-does the search engine index metadata, controlled vocabulary, just full text, or specific fields?
  5. Results-how are the results presented? What ranking and clustering algorithms are used? Is there a best bets option?



Often information architects are told they don’t have a say over the search system-it is an IT thing because it is a complex application. While it is important for someone to understand the technical issues like load balancing, it is also important for the search system to work for users. The information architect may better understand how the interface can be changed to meet user needs. Ideally IT personnel, IA personnel, and others (web designers, graphics people) will work collaboratively on the system.


What to search: Do you index everything together or do you separate the information into search zones which can be searched separately? What components of content are indexed (author, title, etc.)? Is some content “better” than other content? Do you make the “good stuff” searchable on its own or rise to a higer level in results? All of these are important considerations.


  • Search zones can allow users to eliminate content areas that they determine will not be useful. They can be based on content type, audience, role, subject/topic, chronology, author, etc.


  • Destination pages have the information that users’ need. Navigation pages include main pages, search pages, and pages that lead to the destination pages.  Eliminating navigation pages from search retrieval can help reduce the number of results a user has to go through to get the destination page they want.


  • Content components-by selecting certain content components to index you can make the search system more flexible and meaningful. Options such as document titles, authors, or descriptions can be very helpful.


Search algorithms-there are about 40 separate retrieval algorithms around. These are designed to solve specific problems and no search engine will meet all the needs of a user.  Most algorithms work by matching patterns (results must have the matching text string that was entered). Some algorithms return only highly relevant options and others return many more options which have a lower relevancy. They have different levels of recall and precision.


  • Recall-high recall means that you will get as close as possible to all the matching items in the collection but they may not be as relevant.


  • Precision-high precision means that only the most relevant items will be retrieved, but there will be fewer.


  • Cited by-this finds other documents which cite the document the user found.


  • Active bibliography (related documents)-this targets the bibliography, the documents cited by the document found and retrieves them as related documents.


  • Similar documents based on text-converts the found document into a query to find similar documents (looks for similar terms throughout document).


  • Related documents co-citation-this assumes that documents have a common link if they are found together in the bibliography of another paper.



Query builders-these can improve the performance of a search:


  • Spell-checker-misspellings are automatically corrected so the right results will still appear.
  • Phonic tools-work well with name searching (it will include Smyth in a search for Smith).
  • Stemming tools-will retrieve words with the same stem as the entered term (lodge will also find lodging and lodger).
  • Natural language processing tools-analyzes the syntax of a query (how to? Who is?) and will use that to narrow the retrieved documents.
  • Controlled vocabulary & thesauri-expand semantic nature of query and include synonyms.




What part of the items do you choose to display on the results list? Users who know what they are looking for generally need less information than the users who are unsure. Some site accommodate both by giving the user an option of displaying author and title only or including a brief summary or part of an abstract. It is important to display information that the user is looking for immediately rather than making them click through several options (an online phonebook will display the phone numbers in the results list and not make the user click to another screen).


How many results will be displayed? This depends on many factors from the user’s monitor and connection rate to the amount of information displayed for each item. It is best to show fewer results per page but give the user options to change that.


In what order are the results displayed?  Items can be sorted alphabetically (by author, title, etc) or chronologically (date). Sorting can help users make decisions (sort by price etc.). Items can also be ranked by relevance (how many terms appeared in the document, how often the terms appeared, are the terms adjacent, popularity of document, etc.). Ranking is helpful when users need to understand something and are looking for relevance. However, relevance is relative and most users will assume that whatever documents are on the top are the best. Some systems use human indexers to fill in descriptor fields and to manually select some of the better options (best bets, best links, etc.). This is expensive and is usually only done for some of the most common queries. Google uses popularity to rank its results while others like Digg allow users to rate items and then use the ratings to display results. Some sites like Yahoo search marketing allow companies to pay to be placed higher in the results.


Search interface:


In designing the search interface you must consider the level of expertise of the users (Boolean operators vs. natural language, etc.). Are the users doing comprehensive research or just getting an idea of what is out there? What kind of information is being searched (are their structured fields, tagged, metadata, full text, etc.). Most search interfaces are the standard “Google” box and a button (try to label it “search” rather than “go” or something else so users know they are searching). Most users will never use an advanced search options so Morville and Rosenfeld (2006) don’t recommend spending time and money on its design. Instead they suggest improving efforts to help users refine a search when they don’t get what they want from their first attempt in the search box. Provide tips, point them to the advanced search options, offer ways to narrow down the results (search zones), etc.


IA Chapter 9: “Thesauri, controlled vocabularies, and metadata”-these are becoming important building blocks in large websites. We are moving away from single taxonomy to faceted approaches which are more flexible.


Metadata-data about data. These are tags which tell us about web pages, documents, and other pieces of content for the purpose of navigation and retrieval (they make things findable). They are used by search engines to retrieve items.


Controlled vocabularies-this is basically a listing of preferred terms which ensures that synonyms will be grouped together under one label (Cats includes felines). This list is called an authority file. When relationships between terms are hierarchical (broader or narrower) then it is a classification scheme. When the relationships are related (see also, use for) then it is a thesaurus.

Classification scheme-this is a hierarchical listing of preferred terms (controlled vocabulary) and is also called taxonomy. It can be used as a browsable list that users see or it can be hidden and used on the backend by catalogers and information architects to organize, index, and tag documents. Some classic examples of course are the Dewey Decimal Classification (DDC) and the Library of Congress Classification.


Thesauri-the traditional book form helps users go from one word to many synonyms. The web based version integrated with user interfaces is more like a database. It manages all the synonyms and provides the best term for different terms that users may enter for the same thing. It identifies relationships between terms whether they are hierarchical, equivalent, or associative.


Thesauri have the capabilities to describe semantic relationships between terms not just give synonyms.


  • Equivalence-is used to link the preferred term with synonyms, near synonyms, abbreviations, etc. It may list the preferred term followed by variant terms which are roughly equal in meaning.
  • Hierarchical-this type of relationship categorizes information and links broader and narrower terms using a parent-child relationship model. Examples are generic terms (Bird NT Robin), whole-part (hand NT finger), instance – the narrower term is an example of the broader term (Seas NT South China Sea).
  • Associative-these are strongly implied related terms but are very subjective. (Hammer RT nail).


Sometimes a thesaurus will help define an ambiguous term as in: cells (biology), cells (electric), cells (prison), etc.


Facets: Ranganathan had the idea in the 1930s to create a different type of classification system that recognized problems with a single top-down hierarchy. Instead of asking where do I put this? He asked, how can this be described? Instead of one taxonomy for everything we can create multiple taxonomies for different facets of content. An excellent example is wine.com (http://www.wine.com). They created multiple facets of wine which can be browsed: type, region, winery, year, and price. These can also be combined where the user can define the facets (s)he is looking for: Riesling (white wine) from Germany that is under $30.


This type of faceted search is extremely powerful and flexible. Libraries often have a similar option in the fields one can search in the catalog: book (rather than AV), by specific author, on specific subject.  Or Specific subject, any author, written after 1990, any format.


Websites reviewed:


“How Much Information?” http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/ 

PEW Internet & American Life Project http://www.pewinternet.org/ 

CIADA http://www.caida.org/home/ 


“How Much Information?” http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/


This is a study done in 2003 by UC Berkeley on the amount of information which is being generated yearly. They divide the results by storage media (print, film, magnetic, and optical), and seen and heard in information flows (telephone, radio, TV, and Internet). The data was analyzed for 2002 and then compared with the 2000 study (1999 data).


In the storage media results (print, film, magnetic and optical), we generated 5 exabytes of new information in 2002.  That is approximately the equivalent to the amount of information in 37,000 libraries the size of the Library of Congress (book collection)! Most of the new information was stored on hard disks (magnetic media); 92%.


Another big statistic was that the US is the producer of 40% of the world’s new stored information (33% of print, 30% film, 40% optical, 50% of the information stored on magnetic media). With world population estimated at 6.3 billion, this would mean about 800 MB of information a year per person. If this was printed on paper it would be about 30 feet of books per year per person (Swearingen et al., 2003).


Obviously this is a lot of information and it is only growing at a greater rate. Someone has to organize all these exabytes of information and it is clear that professional librarians cannot do it alone. The data added to the Internet alone is just too much for any one group to take on, hence the research and interest in automated search engines, automated metadata, and tagging.



PEW Internet & American Life Project http://www.pewinternet.org/


This is a national project conducted by PEW Research Center (the folks that conduct tons of surveys about just about everything) studying important aspects of the internet including adoption and usage statistics. There were several reports which discussed issues of privacy. One concerned the use of web-based word processing and other knowledge/ information applications which people store online. People liked having access to their work from any computer with internet connectivity, but they wanted their work kept private and did not want it shared. Another report I looked at (see below) also indicated that people have privacy concerns.


I took a look at one of the reports on the Internet evolution, The Future of the Internet II (Anderson & Rainie, 2006). This surveyed internet leaders and analysts and discussed various predictions for the year 2020.  Most polled felt that the global network would be low-cost and thriving. However 42% were worried about humans’ ability to control the technology in the future. Many felt that dangers and dependencies will continue to grow beyond our ability to stay in charge of technology.  Some felt that new addiction problems will develop. Another interesting statement was that people will continue to disclose more and more personal information both knowingly and unknowingly and while some benefits will be gained, privacy will be lost. There was also a disagreement over whether the world will be a better place in 2020 because of the greater transparency of people and institutions because of the internet: 46% thought the benefits of transparency will outweigh the loss of privacy but 49% disagreed.


CIADA http://www.caida.org/home/


CIADA is an international cooperative association which analyzes Internet data. They provide tools, research, and analyses with the goal of maintaining and engineering the massive global internet infrastructure.


I took a look at the Day in the Life of the Internet 2008 project (DITL 2008) analysis presented by Sebastian Castro (2008) at a June 2008 workshop. The “general stats” section compares data from DITL 2007 with DITL 2008. It is interesting to note the change in query counts (3.84 billion in 2007 and 7.56 billion in 2008) and unique clients (2.8 million in 2007 and 5.6 million in 2008). Keep in mind that these statistics were for one 24 hour period! Most of the technical jargon I didn’t understand. They concluded that there is an increase in the number of AAAA queries (but I don’t know what those are) and also 25% of the internet traffic is due to invalid TLD (but I don’t know what that is either). I think TLD means top-level domain which is whatever comes after the final dot in a domain name (in www.example.com the top-level domain would be com). If 25% of traffic is because the top-level domains in URLs are wrong, that is a lot of superfluous traffic that may be able to be eliminated.


The following topology map is a visual representation of the internet for two weeks (January 1-17 2008) from the CAIDA site (http://www.caida.org/research/topology/as_core_network/). There is a lot of internet traffic out there which is all creating and sharing and using and hunting for information.


Visual representation of 2 weeks of the internet

Visual representation of 2 weeks of the internet

Week 4 references:


Anderson, J., & Rainie, L. (2006, September 24). The future of the Internet II. In PEW Internet & American Life Project. Retrieved September 17, 2008, from PEW Research Center Web site: http://www.pewinternet.org/PPF/r/188/report_display.asp


Castro, S. (2008, June). DITL 2008 analysis. In CAIDA. Retrieved September 19, 2008, from CAIDA Web site: http://www.caida.org/publications/presentations/2008/oarc_castro_ditlanalysis/index.xml


 Morville, P., & Rosenfeld, L. (2006). Information architecture for the world wide web (3rd ed.). Sebastopol, CA: O’Reilly.


Swearingen, K., Charles, P., Good, N., Jordan, L. L., & Pal, J. (2003, October 27). How much information? 2003. Retrieved September 15, 2008, from University of California Berkeley Web site: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/