A personal knowledge base (PKB) is an electronic tool used to express, capture, and later retrieve the personal knowledge of an individual. It differs from a traditional database in that it contains subjective material particular to the owner, that others may not agree with nor care about. Importantly, a PKB consists primarily of knowledge, rather than information; in other words, it is not a collection of documents or other sources an individual has encountered, but rather an expression of the distilled knowledge the owner has extracted from those sources.
Video Personal knowledge base
Definition
The term personal knowledge base was mentioned as early as the 1980s, but the term came to prominence when it was described at length in publications by computer scientist Stephen Davies and colleagues, who defined the term as follows:
- personal: a PKB is intended for private use, and its contents are custom-tailored to the individual. It contains trends, relationships, categories, and personal observations that its owner perceives but which no one else may agree with. It can be shared, just as one can explain one's own opinion to a hearer, but it is not jointly owned by anyone else any more than explaining one's opinion to a friend causes the friend to own one's mind.
- knowledge: a PKB contains knowledge, not merely information. Its purpose is not simply to aggregate all the information sources one has seen, but to preserve the knowledge that one has learned from those sources. When a user returns to a PKB to retrieve knowledge she has stored, she is not merely pointed back to the original documents, where she must relocate, reread, reparse, and relearn the relevant passages. Instead, she is returned to the distilled version of the particular truth she is seeking, so that the mental model she originally had in mind can be easily reformed.
- base: a PKB is a consolidated, integrated knowledge store. It is a reflection of its owner's memory, which, as Bush and many others have observed, can freely associate any two thoughts together, without restriction. Hence a PKB does not attempt to partition a user's field of knowledge into multiple segments that cannot reference one another. Rather, it can connect any two concepts without regard for artificial boundaries, and acts as a single, unified whole.
Contrast with other classes of systems
The following classes of systems cannot be classified as PKBs:
- collaborative efforts to build a universal objective space (as opposed to an individual's personal knowledge.) The World Wide Web itself is in this category, as were its predecessors HyperTIES and Xanadu, Web categorization systems like the Open Directory Project, and collaborative information collections like Wikipedia.
- search systems like Enfish and the Stuff I've Seen project that index and search one's information sources on demand, but do not give the user the ability to craft and express personal knowledge.
- tools whose goal is to produce a design artifact rather than to maintain knowledge for its own sake. Systems like ART and Writing Environment use intermediate knowledge representations as a means to an end, abandoning them once a final artifact has been produced, and hence are not suitable as PKBs.
- systems that focus on capturing transient information, rather than archiving knowledge that has long-term value. Examples would be Web logs and e-diaries. Tools whose information domain is mostly limited to time management tasks (calendars, action items, contacts, etc.) rather than "general knowledge". Blandford and Green and Palen give excellent surveys; common commercial examples would be Microsoft Outlook, Lotus Notes, and Novell Evolution.
- similarly, tools developed for a specific domain, such as bibliographic research rather than for "general knowledge".
Personal information management
PKM is similar to personal information management, but is a distinct topic based on the "information" vs. "knowledge" difference. PKBs are about recording and managing the knowledge one derives from documents, whereas PIM is more about managing and retrieving the documents themselves.
Maps Personal knowledge base
Historical influences
Non-electronic personal knowledge bases have probably existed in some form since the dawn of written language: Da Vinci's notebooks are a famous example. More commonly, card files and personal annotated libraries have served this function in the pre-electronic age.
Bush's Memex
Undoubtedly the most famous early formulation of an electronic PKB was Vannevar Bush's description of the "Memex" in 1945. Bush surveyed the post-World-War-II landscape and laid out what he viewed as the most important forthcoming challenges to humankind in The Atlantic Monthly. The Memex was a theoretical (never implemented) design or a system to help tackle the information overload problem, already formidable in 1945. In Bush's own words:
Consider a future device for individual use, which is a sort of mechanized private file and library. ... [A] device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Bush envisioned collaborative aspects as well, and even a worldwide system that scientists could freely consult. But an important emphasis throughout the article was on expanding our own powers of recollection: "Man needs to mechanize his record more fully," he says, if he is not to "become bogged down...by overtaxing his limited memory". With the Memex, the user could "add marginal notes and comments," and "build a trail of his interest" through the larger information space. She could share trails with friends, identify related works, and create personal annotations. Bush's Memex would give each individual the ability to create, categorize, classify, and relate his own set of information corresponding to his unique personal viewpoint. Much of that information would in fact consist of bits and pieces from public documents, just as the majority of the knowledge inside our own heads has been imbibed from what we read and hear. But the Memex also allowed for the specialized recording of information that each individual perceived and needed to retain. The idea of supplementing our memory" was not a one-size-fits-all proposition, since no two people have the same interests, opinions, or memories. Instead, it demanded a subjective expression of knowledge, unique to each individual.
Graphical knowledge capture tools
Great emphasis is placed on the pictorial nature of diagrams to represent abstract knowledge; the use of spatial layout, color, and images is said to strengthen understanding and promote creativity. Each of the three primary schools--mind mapping, concept mapping, and cognitive mapping--prescribes its own data model and procedures, and each boasts a number of software applications designed specifically to create compatible diagrams.
Mind mapping
Mind mapping was promoted by pop psychologist Tony Buzan in the 1960s, and commands the allegiance of an impressive number of adherents worldwide. A mind map is essentially nothing more than a visual outline, in which a main idea or topic is written in the center of the diagram, and subtopics radiate outwards in increasing levels of specificity. The primary value is in the freeform, spatial layout (rather than a sequential, numbered outline), the ability for a software application to hide or reveal select levels of detail, and as mentioned above, graphical adornments. The basic data model is a tree, rather than a graph, with all edges implicitly labeled "supertopic/subtopic". Numerous tools are available for constructing mind maps.
Concept mapping
Concept maps were developed by Cornell Professor Joseph Novak, and based on David Ausubel's assimilation theory of learning. An essential tenet is that newly encountered knowledge must be related to one's prior knowledge in order to be properly understood. Concept maps help depict such connections graphically. Like mind maps, they feature evocative words or phrases in boxes connected by lines. There are two principal differences, however: first, a concept map is properly a graph, not a tree, permitting arbitrary links between nodes rather than only parent/child relationships; and second, the links are labeled to identify the nature of the inter-concept relationship, typically with a verb phrase. In this way, the links on a diagram can be read as English sentences, with the upstream node as the subject and the downstream node as the direct object of the sentence.
There are many applications available that could be used for drawing these diagrams, not all of which directly acknowledge their support for concept maps in particular.
A concept map is virtually identical to the notion of a "semantic network", which has served as a cornerstone for much artificial intelligence work since its inception. Semantic networks, too, are directed graphs in which the nodes represent concepts and labeled edges the relationships between them. Much psychology research has strengthened the idea that the human mind internalizes knowledge in something very like this sort of framework. This likely explains the ease with which concept mapping techniques have been adopted by the uninitiated, since concept maps and semantic networks can be considered equivalent.
Cognitive mapping
Cognitive mapping, developed by Fran Ackermann and Colin Eden at the University of Strathclyde, uses the same data model as does concept mapping, but with a new set of techniques. In cognitive maps, element names have two parts, separated by an ellipsis that is read "as opposed to" in order to further clarify the semantics of the node. ("Cold...hot" is different from "cold...freezing," for example.) Links are of three types--causal, temporal, connotative--the first of which is the most common and is read as "may lead to". Generally cognitive mapping is best suited to domains involving arguments and decision making. Cognitive mapping is not nearly as widespread as the other two paradigms. Together, these and related methods have brought into the mainstream the idea of breaking down knowledge into its fundamental elements, and representing them graphically. Students and workers from widely diverse backgrounds have experienced success in better articulating and examining their own knowledge, and in discovering how it relates to what else they know. Although architectural considerations prevent any of these tools from functioning as bona fide PKBs, the ideas they have contributed to a front-end interface mechanism cannot be overestimated.
Hypertext systems
Many in the hypertext community reference Vannevar Bush's article as the cornerstone of their heritage. Hence the development of hypertext techniques, while seldom applied specifically towards PKB solutions, is important. There have basically been three types of hypertext systems: those that exploit features of non-linear text to create a dynamic, but coherent "hyperdocument"; those that prescribe ways of linking existing documents together for navigation and expression of affinities; and those that use the hypertext model specifically to model abstract knowledge. Though the first and especially the second category have dominated research efforts (and public enthusiasm) over the past several decades, it is this third class that is closest in spirit to the original vision of hypertext by its founders.
In a similar vein to Bush, Doug Engelbart's focus was to develop computer systems to "help people think better". He sought data models that more closely paralleled the human thought process, and settled on using hypertext as a way to represent and store abstract human knowledge. Although his "Augment" system underwent many changes, the original purpose closely aligned with that of PKBs.
More recently, Randall Trigg's TextNet and NoteCards systems further explored this idea. TextNet revolved around "primitive pieces of text connected with typed links to form a network similar in many ways to a semantic network". Though text-centric, it was clear that Trigg's goal was to model the associations between primitive ideas and hence to reflect the mind's understanding. "By using...structure, meaning can be extracted from the relationships between chunks (small pieces of text) rather than from the words making them up." The subsequent NoteCards effort was similarly designed to "formulate, structure, compare, and manage ideas". It was useful for "analyzing information, constructing models, formulating arguments, designing artifacts, and generally processing ideas".
Conklin and Begeman's gIBIS system was another early effort into true knowledge representation, specifically for the field of design deliberations and arguments. The project lived on in the later project QuestMap and the more modern Compendium, which has been primarily used for capturing group knowledge expressed in face-to-face meetings. In all these cases, systems use semantic hypertext in an attempt to capture shared knowledge in its most basic form. Other examples of knowledge-based hypertext tools include Mental Link, Aquanet, and SPRINT, as well as a few current commercial tools such as PersonalBrain and Tinderbox and open source tools such as TiddlyWiki.
Note-taking applications
Note-taking applications allow a user to create snippets of text and then organize or categorize them in some way. These tools can be used to form PKBs that are composed of such text snippets.
Most of these tools are based on a tree hierarchy, in which the user can write pages of notes and then organize them into sections and subsections. The higher level sections or chapters often receive a colored tab exactly as a physical three-ring notebook might. Other designers eschew the tree model for a more flexible category-based approach (see section data models). The primary purpose of all these tools is to offer the benefits of freeform note-taking with none of the deficiencies: users are free to brainstorm and jot down anything from bullet points to polished text, while still being able to search, rearrange, and restructure the entire notebook easily.
An important subcategory of note-taking tools is outliners (e.g., OmniOutliner), or applications specifically designed to organize ideas in a hierarchy. These tools typically show a two-pane display with a tree-like navigation widget in the left-pane and a list of items in the right-pane. Topics and subtopics can be rearranged, and each outline stored in its own file. Modern outliners feature the ability to add graphics and other formatting to an item, and even hyper links to external websites or documents. The once abandoned (but now resurrected) Ecco system was among the first to allow items to have typed attributes, displayed in columns. This gives the effect of a custom spreadsheet per topic, with the topic's items as rows and the columns as attributes. It allows the user to gracefully introduce structure to their information as it is identified.
Of particular interest are applications optimized for subsuming portions of an information space realm into a PKB, where they can be clustered and arranged according to the user's own perceptions. The Virtual Notebook System (VNS) was one of the first to emphasize this. VNS was designed for sharing information among scientists at the Baylor College of Medicine; a user's "personal notebook" could make references to specific sections of a "community notebook," and even include arbitrary segments of other documents through a cut-and-paste mechanism.
Document management systems
Another influence on PKBs are systems whose primary purpose is to help users organize documents, rather than personal knowledge derived from those documents. Such systems do not encode subjective knowledge per se, but they do create a personal knowledge base of sorts by allowing users to organize and cross-reference their information artifacts.
These efforts provide alternative indexing mechanisms to the limited "directory path and file name" approach. Presto replaces the directory hierarchy entirely with attributes that users assign to files. These key-value pairs represent user-perceived properties of the documents, and are used as a flexible means for retrieval and organization. William Jones' Memory Extender was similar in spirit, but it dynamically varied the "weight" of a file's keywords according to the user's context and perceived access patterns. In Haystack, users--in conjunction with automated software agents--build a graph-based network of associative links through which documents can be retrieved.
Metadata and multiple categorization can also be applied to provide multiple retrieval paths customized to the way the individual thinks and works with their information sources. WebTop allowed the user to create explicit links between documents, but then also merged these user-defined relationships with other types of associations. These included the hyperlinks contained in the documents, associations implied by structural relationships, and content similarities discovered by text analysis. The idea was that any way in which items can be considered "related" should be made available to the user for help with retrieval.
A subclass of these systems integrate the user's personal workspace with a search facility, blurring the distinction between information retrieval and information organization. SketchTrieve, DLITE, and Garnet each materialized elements from the retrieval domain (repositories, queries, search results) into tangible, manipulatable screen objects. These could be introduced directly into a spatial layout that also included the information sources themselves. These systems can be seen as combining a spatial hypertext interface as in VIKI with direct access to digital library search facilities. NaviQue was largely in the same vein, though it incorporated a powerful similarity engine to proactively aid the user in organization. CYCLADES let users organize Web pages into folders, and then attempted to infer what each folder "means" to that user, based on a statistical textual analysis of its contents. This helps users locate other items similar to what's already in a folder, learn what other users have found interesting and have grouped together, etc.
All of these document management systems are principally concerned with organizing objective information sources rather than the expression of subjective knowledge. Yet their methods are useful to consider with respect to PKB systems, because such a large part of our knowledge comprises things we remember, assimilate, and repurpose from objective sources. Search environments like SketchTrieve, as well as snippet gatherers like YellowPen, address an important need in knowledge management: bridging the divide between the subjective and objective realms, so that the former can make reference to and bring structure to the latter.
Claims and benefits
PKB systems make various claims about the advantages of using them. These can be classified as follows:
- Knowledge generation and formulation. Here the emphasis is on procedure, not persistence; it is the act of simply using the tool to express one's knowledge that helps, rather than the ability to retrieve it later.
- Knowledge capture. PKBs do not merely allow one to express knowledge, but also to capture it before it elusively disappears. Often the emphasis is on a streamlined user interface, with few distractions and little encumbrance. The point is to lower the burden of jotting down one's thoughts so that neither task nor thought process is interrupted.
- Knowledge organization. A 2003 study on note-taking habits found that "better organization" was the most commonly desired improvement in people's own information recording practices.
- Knowledge management and retrieval. Perhaps the most critical aspect of a PKB is that the knowledge it stores is permanent and accessible, ready to be retrieved at any later time.
- Integrating heterogeneous sources. Recognizing that the knowledge people form comes from a variety of different places, many PKB systems emphasize that the information from diverse sources and of different types can be integrated into a single database and interface.
Data models
PKB systems can be compared along a number of different axes, the most important of which is the underlying data model they support. This is what prescribes and constrains the nature of the knowledge they can contain: what types of knowledge elements are allowed, how they can be structured, and how the user perceives them and can interact with them.
Three aspects of data models can be identified: the structural framework, which prescribes rules about how knowledge elements can be structured and interrelated; the knowledge elements themselves, or basic building blocks of information that a user creates and works with; and schema, which involves the level of formal semantics introduced into the data model.
Structural frameworks
The following structural frameworks have been featured in one or more prominent PKB systems.
Tree
Systems that support a tree model allow knowledge elements to be organized into a containment hierarchy, in which each element has one and only one "parent". This takes advantage of the mind's natural tendency to classify objects into groups, and to further break up each classification into subclassifications. It also mimics the way that a document can be broken up into chapters, sections, and subsections. It tends to be natural for users to understand.
All of the applications for creating Buzan mind maps are based on a tree model, because a mind map is a tree. Each mind map has a "root" element in the center of the diagram (often called a "main topic") from which all other elements emanate as descendents. Every knowledge element has one and only one place in this structure. Some tools, such as MindManager, extend this paradigm by introducing "floating topics", which are not anchored to the hierarchy, and permitting "crosslinks" to arbitrary topics, similar to those in concept maps.
Other examples of tree-based systems are most personalized search interfaces, outliners, and most of the "notebook-based" note-taking systems. By allowing them to partition their notes into sections and subsections, note-taking tools channel users into a tree hierarchy. In recognition of this confining limitation, many of these tools also permit a kind of "crosslink" between items, or employ some form of transclusion (see below) to allow items to co-exist in several places. The dominant paradigm in such tools, however, remains the simple parent-child hierarchy.
Graph
Graph-based systems allow users to create knowledge elements and then to interconnect them in arbitrary ways. The elements of a graph are traditionally called "vertices," and connected by "arcs," though the terminology used by graph-based systems varies widely (see Table 1) and the hypertext community normally uses the terms "nodes" and "links". There are no restrictions on how many arcs one vertex can have with others, no notion of a "parent/child" relationship between vertices (unless the user chooses to label an arc with those semantics), and normally no "root" vertex. In many systems, arcs can optionally be labeled with a word or phrase indicating the nature of the relationship, and adorned with arrowheads on one or both ends to indicate navigability. (Neither of these adornments is necessary with a tree, since all relationships are implicitly labeled "parent/child" and are navigable from parent to child.) A graph is a more general form of a tree, and hence a strictly more powerful form of expression.
This model is the defining characteristic of hypertext systems including many of those used for document management. It is also the underpinning of all concept-mapping tools, whether they actually acknowledge the name "concept maps" or advertise themselves simply as tools to draw knowledge diagrams. As mentioned previously, graphs draw their power from the fact that humans are thought to model knowledge as graphs (or equivalently, semantic networks) internally. In fact, it could be argued that all human knowledge can be ultimately reduced to a graph of some kind, which argues strongly for its sufficiency as a structural framework.
An interesting aspect of graph-based systems is whether or not they require a fully connected graph. A fully connected graph is one in which every vertex can be reached from any other by simply performing enough arc traversals. There are no "islands" of vertices that are severed from each other. Most graph-based tools allow non-fully-connected graphs: knowledge elements are added to the system, and connected arbitrarily to each other, without constraint. But a few tools, such as PersonalBrain and Compendium, actually require a single network of information in which every knowledge element must be indirectly connected to every other. If one attempts to remove the last link that connects a body of nodes to the original root, the severed elements are either "forgotten" or else moved to a deleted objects heap where they can only be accessed by restoring a connection to the rest of the graph.
Some hypertext systems add precision to the basic linking mechanism by allowing nodes to reference not only other nodes, but sections within nodes. This ability is especially useful if the nodes themselves contain sizeable content, and also for PKB elements making reference to fragments of objective sources.
Tree plus graph
Although graphs are a strict superset of trees, trees offer some important advantages in their own right: simplicity, familiarity, ease of navigation, and the ability to conceal details at any level of abstraction. Indeed, the problem of "disorientation" in hypertext navigation largely disappears with the tree model; one is never confused about "where one is" in the larger structure, because traversing the parent hierarchy gives the context of the larger surroundings. For this reason, several graph-based systems have incorporated special support for trees as well, to combine the advantages of both approaches. For instance, in concept mapping techniques, a generally hierarchical paradigm is prescribed, after which users are encouraged to identify "crosslinks" between distant concepts. Similarly, some systems using the mind mapping paradigm permit arbitrary relationships between nodes.
One of the earliest systems to combine tree and graph primitives was TEXTNET, which featured two types of nodes: "chunks" (which contained content to be browsed and organized) and "table of contents" nodes (or "tocs".) Any node could freely link to any other, permitting an unrestricted graph. But a group of tocs could be combined to form a tree-like hierarchy that bottomed out in various chunk nodes. In this way, any number of trees could be superimposed upon an arbitrary graph, allowing it to be viewed and browsed as a tree, with all the requisite advantages. Strictly speaking, a network of tocs formed a DAG rather than a tree. This means that a "chunk" could be represented in multiple places in the tree, if two different traversal paths ended up referring to the same chunk. A DAG is essentially the result of applying transclusion to the tree model. This is also true of NoteCards. NoteCards offered a similar mechanism, using "FileBoxes" as the tree component that was overlaid upon the semantic network of notecards.
Brown University's IGD project explored various ways to combine and display unrestricted graphs with hierarchy, and used a visual metaphor of spatial containment to convey both graph and tree structure. Their notion of "link inheritance" simplifies the way in which complex dual structures are displayed while still faithfully depicting their overall trends. Commercially, both PersonalBrain and Multicentrix provide explicit support for parent/child relationships in addition to arbitrary connections between elements, allowing tree and graph notions to coexist. Some note-taking tools, while essentially tree-based, also permit crosslinks between notes.
Spatial
Some designers have shunned links between elements altogether, favoring instead spatial positioning as the sole organizational paradigm. Capitalizing on the human's tendency to implicitly organize through clustering, making piles, and spatially arranging, some tools offer a 2D workspace for placing and grouping items. This provides a less formal (and perhaps less intimidating) way for a user to gradually introduce structure into a set of items as it is discovered.
This approach originated from the spatial hypertext community, demonstrated in various projects, and VIKI/VKB With these programs, users place information items on a canvas and can manipulate them to convey organization imprecisely. Some project could infer the structure from a user's freeform layout: a spatial parser examines which items have been clustered together, colored or otherwise adorned similarly, etc., and makes judgments about how to turn these observations into machine-processible assertions. While others (Pad) allowed users to view different objects in varying levels of detail as they panned around the workspace.
Certain note-taking tools combine an overarching tree structure with spatial freedom on each "frame" or "page". Users can access a particular page of the notebook with basic search or tree navigation facilities, and then lay out notes and images on the page as desired. Many graph-based approaches (such as concept mapping tools) also allow for arbitrary spatial positioning of elements. This allows both kinds of relationships to be expressed: explicit links and less formal expression through creative use of the screen.
Categories
In category-based structural frameworks, rather than being described in terms of their relationships to other elements (as with a tree or graph), items are simply grouped together in one or more categories, indicating that they have something in common. This scheme is based on the branch of pure mathematics called set theory, in which each of a body of objects either has, or does not have, membership in each of some number of sets. There is normally no restriction as to how many different categories a given item can belong to, as is the case with mathematical sets.
Users may think of categories as collections, in which the category somehow encloses or "owns" the items within it. Indeed, some systems depict categories in this fashion, such as the Vista interface where icons standing for documents are enclosed within ovals that represent categories. This is merely a convention of display, however, and fundamentally, categories are the same as simple keywords.
The most popular application to embrace the category approach was the original Agenda. All information retrieval in Agenda was performed in terms of category membership. Users specified queries that were lists of categories to include (or exclude), and only items that satisfied those criteria were displayed. Agenda was particularly sophisticated in that the categories themselves formed a tree hierarchy, rather than a flat namespace. Assigning an item to a category also implicitly assigned it to all ancestors in the hierarchy.
Personal Knowbase is a more modern commercial product based solely on a keyword (category) paradigm, though it uses a simple flat keyword structure rather than an inheritance hierarchy like Agenda. Haystack and Chandler are other information management tools which use categorization in important ways. William Jones' Memory Extender took an artificial intelligence twist on the whole notion of keywords/categories, by allowing an item's keywords to be weighted, and adjusted over time by both the user and the system. This allowed the strength of category membership to vary dynamically for each of an item's assignments, in an attempt to yield more precise retrieval.
Chronological
Yale University's Lifestreams project used timestamps as the principal means of organization and retrieval of personal documents. In Fertig et al.'s own words:
A lifestream is a time-ordered stream of documents that functions as a diary of your electronic life; every document you create is stored in your lifestream, as are the documents other people send you. The tail of your stream contains documents from the past, perhaps starting with your electronic birth certificate. Moving away from the tail and toward the present, your stream contains more recent documents such as papers in progress or the latest electronic mail you've received...
Documents are thus always ordered and accessed chronologically. Metadata-based queries on the collection produce "substreams," or chronologically ordered subsets of the original documents. The rationale for time-based ordering is that "time is a natural guide to experience; it is the attribute that comes closest to a universal skeleton-key for stored experience". Whether chronology is our principal or even a common natural coding mechanism psychologically can be debated. But since any PKB system can easily create such an index, it seems worthwhile to follow Lifestreams' lead and allow the user to sort and retrieve based on time, as many systems have done. If nothing else, it relieves the user from having to create names for knowledge elements, since the timestamp is always an implicit identifying mark. PlanPlus, based on the Franklin-Covey planner system, is also chronologically modeled, and a number of products based on other data models offer chronological indexing in addition to their core paradigm.
Aquanet's framework
Though advertised as a hypertext system, Marshall et al.'s Aquanet went far beyond the traditional node-link graph model. Knowledge expressed in Aquanet is centered around "relations," or n-ary links between objects in which the semantics of each participant in the relation is specified by the relation type. Each type of relation specifies a physical display (i.e., how it will be drawn on the screen, and the spatial positioning of each of its participants), and a number of "slots" into which participants can be plugged. Each participant in a relation can be either a base object, or another relation. Users can thus define a schema of relation types, and then build a complex semantic model out of relations and objects. Since relation types can be specified to associate any number of nodes (instead of just two, as in the graph model), this potentially allows more complex relationships to be expressed.
It should be noted, however, that the same effect can be achieved in the basic graph model by simply taking the n-ary relations and "reifying" them (i.e., turning them into nodes in their own right.) For instance, suppose we define a relation type "assassination," with slot types of "assassin," "victim," "location," and "weapon". We could then create a relation based on this type where the participants are "John Wilkes Booth," "Abraham Lincoln," "Ford's Theatre," and "derringer". This allows us to express a complex relationship between multiple objects in Aquanet. But we can express the same knowledge with the basic graph model by simply creating a node called "Lincoln's assassination" and then creating typed links between that node and the other four labeled "assassin," "victim," etc. Aquanet's biggest achievement in this area is the ability to express the schema of relation types, so that the types of objects an "assassination" relation can connect are consistent and enforced.
Knowledge elements
There are several options for specifying what knowledge elements consist of, and what kind of internal structure, if any, they possess:
- Word/phrase/concept. Most systems engineered for knowledge representation encourage structures to be composed of very simple elements, usually words or phrases. This is in the spirit of both mind mapping and concept mapping, where users are encouraged to use simple phrases to stand for mental concepts.
- Free text notes. Nearly all systems permit large amounts of free text to exist in the PKB, either as the contents of the elements themselves (NoteCards, Hypercard, TreePad) or attached to elements as separate, supplementary pages (Agenda, Zoot, HogBay).
- Links to an information space. Since a user's knowledge base is to correspond to her mental perceptions, it seems profitable for the PKB to point to entities in the information space from which she formed those perceptions. Many systems do in fact allow their knowledge elements to point to the original sources in some way. There are three common techniques:
- The knowledge element actually represents an original source. This is the case for document management systems (WebTop, MyLifeBits, Haystack), integrated search facilities (NaviQue, CYCLADES), VIKI/VKB. Tinderbox will also allow one of its notes to be a URL, and the user can control whether its contents should be captured once, or "auto-fetched" as to receive constant web updates. Many systems, in addition to storing a page of free text for each knowledge element, also permit any number of hyperlinks to be attached to a knowledge element (e.g., Freemind, PersonalBrain, Inspiration). VNS, which allows users to point to a community notebook page from within their personal notebook, gives similar functionality.
- The knowledge element is a repurposed snippet from an original source. This is potentially the most powerful form, but is rare among fully featured PKB systems. Cartagio, Hunter-Gatherer, and YellowPen all allow Web page excerpts to be assimilated and organized, although they primarily only do that, without allowing them to easily be combined with other subjective knowledge. DEVONThink and MyBase's WebCollect plug-in add similar functionality to their more general-purpose, tree-based information managers. Both of these systems, when a snippet is captured, archive the entire Web page locally so it can be returned to later. The user interfaces of CircusPonies and StickyBrain have been heavily optimized towards grabbing information from other applications and bringing them into the PKB without disturbing the user's workflow.
- Composites Some programs allow a user to embed knowledge elements (and perhaps other information as well) inside a knowledge element to form an implicit hierarchy. Trees by themselves fall into this category, of course, since each node in the tree can be considered a "composite" of its content and children. But a few graph-based tools offer composite functionality as well. In Aquanet, "relations" form the fundamental means of connection, and the units that are plugged into a relation can be not only objects, but other relations as well. This lends a recursive quality to a user's modeling. VIKI/VKB's spatial environment offers "subspaces" which let a user partition their visual workspace into subregions, whose internal contents can be viewed at a glance from the parent. Boxer's paradigm is similar. Tinderbox is a graph-based tool that supports hierarchical composite structures, and Compendium extends this even further by allowing transclusion of "views" as well as of nodes. Unlike the other tools, in Compendium the composite hierarchy does not form a DAG, but rather an arbitrary graph: view A can appear on view B, and B can in turn appear on A. The user's intuitive notion of "inside" must be adapted somewhat in this case.
Schema
In the context of PKBs, "schema" means the ability for a user to specify types and introduce structure to aspects of the data model. It is a form of metadata whereby more precise semantics can be applied to various elements of the system. This facilitates more formal knowledge expression, ensures consistency across items of the same kind, and can better allow automated agents to process the information.
Both knowledge elements, and links, can contain various aspects of schema.
Schema for knowledge elements
In a PKB, a "type system" allows users to specify that a knowledge element is a member of a specific class or category of items, to provide a built-in method of organization and retrieval. Generally speaking, systems can make knowledge elements untyped, rigidly typed, or flexibly typed. In addition, they can incorporate some notion of inheritance among elements and their types.
There is a distinction between types and categories here. A category-based scheme, typically allows any number of categories/keywords to be assigned to an item. There are two differences between this and the notion of type. First, items are normally restricted to being of a single type, and this usually indicates a more intrinsic, permanent property of an item than simply its presence in a category collection. (For example, one could imagine an item called "XYZ Corporation" shifting into and out of categories like "competitors", "overseas distributors," or "delinquent debtors" over time, but its core type of "company" would probably be static for all time.) Second, types often carry structural specifications with them: if an item is of a given type, this means it will have values for certain attributes appropriate to that type. Some systems that do not allow typing offer the ability to approximate this function through categories.
Untyped elements are typical among informal knowledge capture tools, since they are designed to stimulate brainstorming and help users discover their nascent mental models. These tools normally want to avoid forcing the user to commit to structure prematurely. Most mind mapping and many concept mapping tools are in this category: a concept is simply a word or phrase, with no other semantic information (e.g., Visual Mind). Note-taking tools also usually take this approach, with all units of information being of the same type "note".
At the other extreme are tools which, like older relational database technology, require all items to be declared as of a specific type when they are created. Often this type dictates the internal structure of the element. These tools are better suited to domains in which the structure of knowledge to be captured is predictable, well understood, and known in advance. For PKB systems, they are probably overly restrictive. KMap and Compendium are examples of tools that allow (and require) each item to be typed; in their case, the type controls the visual appearance of the item, rather than any internal structure.
In between these two poles are systems that permit typed and untyped elements to co-exist. NoteTaker is such a product; it holds simple free-text pages of notes, without any structure, but also lets the user define "templates" with predefined fields that can be used to instantiate uniformly structured forms. TreePad has a similar feature. Some other systems blur the distinction between typed and untyped, allowing the graceful introduction of structure as it is discovered. VKB, for example, supports an elegant, flexible typing scheme, well suited to PKBs. Items in general consist of an arbitrary number of attribute-value pairs. But when consistent patterns emerge across a set of objects, the user can create a type for that group, and with it a list of expected attributes and default values. This structure can be selectively overridden by individual objects, however, which means that even objects assigned to a particular type have flexible customization available to them. Tinderbox offers an alternate way of achieving this flexibility, as described below.
Finally, the object-oriented notion of type inheritance is available in a few solutions. The different card types in NoteCards are arranged into an inheritance hierarchy, so that new types can be created as extensions of old. Aquanet extends this to multiple inheritance among types; the "slots" that an object contains are those of its type, plus those of all supertypes. SPRINT and Tinderbox also use a frame-based approach, and allow default values for attributes to be inherited from supertypes. This way, an item need not define values for all its attributes explicitly: unless overridden, an item's slot will have the shared, default value for all items of that type.
Other forms of schema
In addition to the structure that is controlled by an item's type, other forms of metadata and schema can be applied to knowledge elements.
- Keywords. Many systems let users annotate items with user-defined keywords. Here the distinction between an item's contents and the overall knowledge structure becomes blurred, since an item keyword could be considered either a property of the item, or an organizational mechanism that groups it into a category with like items. Systems using the category data model (e.g., Agenda) can employ keywords for the latter purpose. Some systems based on other data models also use keywords to achieve category-like functionality.
- Attribute/value pairs. Arbitrary attribute/value pairs can also be attached to elements in many systems, which gives a PKB the ability to define semantic structure that can be queried. Frame-based systems like SPRINT and Aquanet are examples, as well as NoteTaker, VKB, and Tinderbox. MindPad[AKS-Labs 2005] is notable for taking the basic concept mapping paradigm and introducing schema to it via its "model editor". As mentioned earlier, adding user-defined attribute/value pairs to the items in an outliner yields spreadsheet-like functionality, as in Ecco and OmniOutliner. Some systems feature attribute/value pairs, but only in the form of system-defined attributes, not user-defined ones.
- Knowledge element appearance. Some tools modify a knowledge element's visual appearance on the screen in order to convey meaning to the user. SMART Ideas and Visual Mind let the user freely choose each element's icon from a variety of graphics, while KMap ties the icon directly to its underlying type. Other graphical aspects that can be modified include color (VIKI), the set of attributes shown in a particular context (VKB), and the spatial positioning of objects in a relation (Aquanet).
Schema for links
In addition to prescribing a schema for knowledge elements, many systems allow some form of information to be attached to the links that connect them.
In most of the early hypertext systems, links were unnamed and untyped, their function being merely to associate two items in an unspecified manner. The mind mapping paradigm also does not name links, but for a different reason: the implicit type of every link is one of generalization/specialization, associating a topic with a subtopic. Hence specifying types for the links would be redundant, and labeling them would clutter the diagram.
Concept mapping prescribes the naming of links, such that the precise nature of the relationship between two concepts is made clear. As mentioned above, portions of a concept map are meant to be read as English sentences, with the name of the link serving as a verb phrase connecting the two concepts. Numerous systems thus allow a word or phrase to decorate the links connecting elements.
Named links can be distinguished from typed links, however. If the text attached to a link is an arbitrary string of characters, unrelated to that of any other link, it can be considered the link name. Some systems, however, encourage the re-use of link names that the user has defined previously. In PersonalBrain, for instance, before specifying the nature of a link, the user must create an appropriate "link type" (associated with a color to be used in presentation) in the system-wide database, and then assign that type to the link in question. This promotes consistency among the names chosen for links, so that the same logical relationship types will hopefully have the same tags throughout the knowledge base. This feature also facilitates searches based on link type, among other things. Other systems, especially those suited for specific domains such as decision modeling (gIBIS and Banxia Decision Explorer), predefine a set of link types that can be assigned (but not altered) by the user.
Some more advanced systems allow links to bear attribute/value pairs themselves, and even embedded structure, similar to those of the items they connect. In Haystack this is the case, since links ("ties") and nodes ("needles") are actually defined as subtypes of a common type ("straw").
KMap similarly defines a link as a subclass of node, which allows links to represent n-ary relationships between nodes, and enables recursive structure within a link itself. It is unclear how much value this adds in knowledge modeling, or how often users take advantage of such a feature. Neptune and Intermedia are two older systems that also support attributes for links, albeit in a simpler manner.
Another aspect of links that generated much fervor in the early hypertext systems was that of link precision: rather than merely connecting one element to another, systems like Intermedia defined anchors within documents, so that a particular snippet within a larger element could be linked to another snippet. The Dexter model covers this issue in detail. For PKB purposes, this seems to be most relevant as regards links to the objective space, as discussed previously. If the PKB truly contains knowledge, expressed in appropriately fine-grained parts, then link precision between elements in the knowledge base is much less of a consideration.
This discussion on links has only considered connections between knowledge elements in the system, where the system has total control over both ends of the connection. As described in the previous section, numerous systems provide the ability to "link" from a knowledge element inside the system to some external resource, e.g. a file or a URL. These external links typically cannot be enhanced with any additional information, and serve only as convenient retrieval paths, rather than as aspects of knowledge representation.
Architecture
The idea of a PKB gives rise to some important architectural considerations. While not constraining the nature of what knowledge can be expressed, the architecture nevertheless affects more mundane matters such as availability and workflow. But even more importantly, the system's architecture determines whether it can truly function as a lifelong, integrated knowledge store--the "base" aspect of the personal knowledge base defined above.
File-based
Traditionally, most electronic PKB systems have employed a simple storage mechanism based on flat files in a filesystem. This is true of virtually all of the mind mapping tools (MindManager), concept mapping tools, and even a number of hypertext tools (NoteCards, Hypercard, Tinderbox). Typically, the main "unit" of a user's knowledge design--whether that be a mind map, a concept map, an outline, or a "notebook"--is stored in its own file somewhere in the filesystem. The application can find and load such files via the familiar "File | Open..." paradigm, at which point it typically maintains the entire knowledge structure in memory.
The advantage of such a paradigm is familiarity and ease of use; the disadvantage is a possibly negative influence on knowledge formulation. Users must choose one of two basic strategies: either store all of their knowledge in a single file; or else break up their knowledge and store it across a number of different files, presumably according to subject matter and/or time period. The first choice can result in scalability problems--consider how much knowledge a user might collect over a decade, if they stored things related to their personal life, hobbies, relationships, reading materials, vacations, academic course notes, multiple work-related projects, future planning, etc. It seems unrealistic to keep adding this kind of volume to a single, ever-growing multi-gigabyte file. The other option, however, is also constraining: each bit of knowledge can be stored in only one of the files (or else redundantly, which leads to synchronization problems), and the user is forced to choose this at knowledge capture time.
Database-based
If a PKB's data is stored in a database system, then knowledge elements reside in a global space, which allows any idea to relate to any other: now a user can relate a book he read on productivity not only to other books on productivity, but also to "that hotel in Orlando that our family stayed in last spring," because that is where he remembers having read the book. Though such a relationship may seem "out of bounds" in traditional knowledge organization, it is exactly the kind of retrieval path that humans often employ in retrieving memories. The database architecture enables a PKB to truly form an integrated knowledge base, and contain the full range of relationships.
Agenda and gIBIS were two early tools that subsumed a database backend in their architecture. More recently, the MyLifeBits project uses Microsoft SQL Server as its storage layer, and Compendium interfaces with the open source MySQL database. A few note-taking applications also store information in an integrated database rather than in user-named files. The only significant drawback to this architectural choice (other than the modest footprint of the database management system) is that data is more difficult to copy and share across systems. This is one true advantage of files: it is a simple matter to copy them across a network, or include them as an e-mail attachment, where they can be read by the same application on a different machine. This problem is solved by some of the following architectural choices.
Client-server
Decoupling the actual knowledge store from the PKB user interface can achieve architectural flexibility. As with all client-server architectures, the benefits include load distribution, platform interoperability, data sharing, and ubiquitous availability. Increased complexity and latency are among the liabilities, which can indeed be considerable factors in PKB design.
One of the earliest and best examples of a client-server knowledge base was the Neptune hypertext system. Neptune was tailored to the task of maintaining shared information within software engineering teams, rather than to personal knowledge storage, but the elegant implementation of its "Hypertext Abstract Machine" (HAM) was a significant and relevant achievement. The HAM was a generic hypertext storage layer that provided node and link storage and maintained version history of all changes. Application layers and user interfaces were to be built on top of the HAM. Architecturally, the HAM provided distributed network access so that client applications could run from remote locations and still access the central store. Another, more recent example, is the Scholarly Ontologies Project whose ClaiMapper and ClaiMaker components form a similar distributed solution in order to support collaboration.
These systems implemented a distributed architecture primarily in order to share data among colleagues. For PKBs, the prime motive is rather user mobility. This is a key consideration, since if a user is to store all of their knowledge into a single integrated store, they will certainly need access to it in a variety of settings. MyBase Networking Edition is one example of how this might be achieved. A central server hosts the user's data, and allows network access from any client machine. Clients can view the knowledge base from within the MyBase application, or through a Web browser (with limited functionality.)
The Haystack project outlines a three-tiered architecture, which allows the persistent store, the Haystack data model itself, and the clients that access it to reside on separate machines. The interface to the middle tier is flexible enough that a number of different persistent storage models can be used, including relational databases, semistructured databases, and object-oriented databases. Presto's architecture exhibits similar features.
Web-based
A variation of the client-server approach is Web-based systems, in which the client system consists of nothing but a (possibly enhanced) browser. This gives the same ubiquitous availability that client-server approaches do, while minimizing (or eliminating) the setup and installation required on each client machine.
KMap was one of the first knowledge systems to integrate with the World Wide Web. It allowed concept maps to be shared, edited, and remotely stored using the HTTP protocol. Concept maps were still created using a standalone client application for the Macintosh, but they could be uploaded to a central server, and then rendered in browsers as "clickable GIFs". Clicking on a concept within the map image in the browser window would have the same navigation effect as clicking on it locally inside the client application. The user's knowledge expressions are stored on a central server in nearly all cases, rather than locally on the browser's machine.
Handheld devices
Lastly, mobile devices are a possible PKB architecture. Storing all of one's personal knowledge on a PDA would solve the availability problem, of course, and even more completely than would a client-server or web-based architecture. The safety of the information is an issue, since if the device were to be lost or destroyed, the user could face irrevocable data loss; this is easily remedied, however, by periodically synchronizing the device's contents with a host computer.
Most handheld applications are simple note-taking software, with far fewer features than their desktop counterparts. BugMe! is an immensely popular note-taking tool that simply lets users enter text or scribble onto "notes" (screenfulls of space) and then organize them in primitive ways. Screen shots can be captured and included as graphics, and the tool features an array of drawing tools, clip art libraries, etc. The value add for this and similar tools is purely the size and convenience of the handheld device, not the ability to manage large amounts of information.
Perhaps the most effective use of a handheld architecture would be as a satellite data capture and retrieval utility. A user would normally employ a fully functional desktop application for personal knowledge management, but when "on the go," they could capture knowledge into a compatible handheld application and upload it to their PKB at a later convenient time. To enable mobile knowledge retrieval, either select information would need to be downloaded to the device before the user needed it, or else a wireless client-server solution could deliver any part of the PKB on demand. This is essentially the approach taken by software like KeySuite, which supplements a feature-rich desktop information management tool (e.g. Microsoft Outlook) by providing access to that information on the mobile device.
See also
- Commonplace book
- Lifelog
- Notetaking
- Comparison of notetaking software
- Outliner
- Personal knowledge management
- Personal wiki
- List of wiki software § Personal wiki software
- Tag (metadata) § Knowledge tags
Notes
References
Source of the article : Wikipedia