8:00-8:30 a.m. Continental Breakfast Hosted by
OCLC
8:30-8:50 a.m. Chair's Report
(Hideyuki Morimoto, U.C. Berkeley)
8:50-10:05 a.m. OCLC CJK Users Group Annual Meeting Program
10:05-10:15 a.m. Break
10:15-11:50 a.m. OCLC Report
After a continental breakfast hosted by OCLC, Hideyuki Morimoto, Chair,
convened the meeting at 8:30. He thanked OCLC for its continuous financial
and logistical support of the annual meeting. In particular, we are grateful
for the financial support provided so that our guest speaker, John Jenkins,
could join us from California.
Convener: Hideyuki Morimoto
Chair's Report
(Click
here for the Power Point version)
Mr. Morimoto introduced the current officers, committee members, and
their activities:
Current Officers (1999-2001):
Website Management Committee
Program Committee Members:
Membership Officer
Pinyin Task Force
The Task Force participated in reviewing OCLC pinyin conversion test
files; conducted a Pinyin Conversion Survey of OCLC CJK member libraries
regarding pinyin conversion and pinyin cataloging issues for Chinese language
materials; analyzed responses; reported findings.
The Chair encouraged close communication with:
OCLC by arranging for the OCLC CJK Users Group meeting, collaborating on
other meetings such as the Z39.50 session, communicating Users Group concerns
to OCLC and informing the Users Group membership of organizational and
software changes.
Members by communicating directly with each new member to encourage participation.
Bylaws: an insufficient level of support for the revision
of bylaws regarding the terms of office (Section C, Article V) caused the
Executive Committee to leave the bylaws as they were.
Nominating Committee:
Mr. Morimoto then announced the election results and introduced the
new officers, who will serve from 2001-2003:
OCLC CJK Users Group Annual Meeting Program
Wen-ling Liu introduced the agenda, the speakers and the program committee
members. As the year 2001 marks the 10th anniversary of the Users Group,
to celebrate the special occasion, the Program Committee invited one of
the founders of the Users Group, Karl Lo, UC San Diego, to talk about the
prospects and future of the Users Group. Dr. Lo served as the Chair
of the Users Group from 1993-1995.
Thanks to the generous travel support of OCLC, especially Glenn Patton,
the Program Committee was able to invite Mr. John Jenkins, who works at
Apple and is one of the Technical Directors of the Unicode Consortium,
to give a presentation on Unicode and East Asian ideographs.
10th Anniversary Speech -- Karl Lo, University of California,
San Diego
Karl Lo graciously provided the audience with his vision of the future
and how we should anticipate our future so that we can steady our course
and reap its rewards. He began with a quote from Jay Jordan, President
and CEO of OCLC, from the new strategic plan that is available on the OCLC
web site.
The OCLC CJK database consists of 2 million records and each record
contains approximately 5,000 characters. This means that the entire OCLC
CJK database could comfortably fit inside the hard drive of a normal desktop
computer and still have room for the contents of Si ku quan shu. Each
of these desktop computers containing personal digital libraries can be
connected to the Internet. Our challenge is to find the way to unlock the
power of the personal digital library. If we can meet the challenge that
OCLC poses with its own strategic plan to change from a bibliographic utility
to a virtual library of multilingual, multiscript, multimedia libraries,
we will no longer be just catalogers but virtual library organizers and
users.
Unicode & East Asian Ideographs -- John Jenkins,
Apple Computers
The powerpoint presentation is available as a .pdf file <http://homepage.mac.com/jenkins/Papers/OCLC.pdf>
Unicode is a trademark owned by the Unicode Consortium <http://www.unicode.org>
; it can't be part of a product name that is trademarked by someone else.
The Unicode Standard is available both in book and online format. The most
recent standard is Unicode 3.1. The ISO/IEC 10646 is very close to
Unicode, though not exactly the same. The original work on Unicode
was done by Xerox and has since gone worldwide. With each enhancement
to Unicode, the number of ideographs has increased dramatically. The most
recent version (3.1) includes 43,253 new ideographs.
There are ten design principles.
How do ideographs get into the standard? The number of ideographs is
huge, and the problem is how to decide which ideographs get into the standard.
The problem is solved by the Ideograph Rapporteur Group (IRG)<http://www.cs.cuhk.edu.hk/~irg/>,
which decides that characters in one character set are the same as characters
in another set. Each character is also given a dictionary position (e.g.
in Kangxi dictionary, Dai Kanwa jiten, Hanyu da ci dian,Daejaweon).
Virtual positions are also assigned to characters that are missing from
particular dictionaries. There are duplicate ideographs in Unicode to cover
variant pronunciations and compatible ideographs. There are now 71,089
ideographs in the standard, with more unique ideographs than in any dictionary.
Retrospective Conversion Projects at Harvard-Yenching
Library- James Lin, Harvard-Yenching Library
In 1996 the Harvard-Yenching Library signed a contract with OCLC to work
on the second phase of retrospective conversion for its East Asian collections.
This project, the largest OCLC CJK project to date, will be completed on
time by the end of June 2001. As of that date all Harvard-Yenching Library's
titles will be accessible online worldwide.
In the past eight years, the Harvard-Yenching Library has undertaken
two retrocon projects. In the first one, grants from the Korean Foundation
in Seoul, Korea, and the United Daily News Group in Taipei, Taiwan, supported
the conversion of 17,000 Korean and 42,500 Chinese card catalog records
into machine-readable format, with both romanized and vernacular scripts.
OCLC was selected to work on the project, and an official contract was
signed between Harvard University and OCLC on October 22, 1993.
The project started in January 1994 and was completed in January 1995
and was carried out according to contract. In fact, OCLC finished the work
a week earlier than the scheduled date.
The second retrocon project started in June 1996, also contracting with
OCLC to covert approximately 325,000 catalog cards to machine-readable
format over a period of five years. The project was funded jointly by Harvard
University and the Harvard-Yenching Institute, with each committing up
to 1.1 million dollars.
The materials to be converted during this project include Chinese, Japanese,
Korean, and Vietnamese monographs, serials, and microforms. CJK rare books
are also included. It is worth mentioning that we try to provide analytics
to every big series in our collection. For example a 1987 publication of
a Korean series: Han'guk yôktae munjip ch'ongsô, collected
works of 3,500 Korean authors in 3,000 volumes. We analyzed every title
in that series. We completed the analytics for the Chinese "Si ku quanshu"
series, and all other "Si ku" related series.
The final completion of the retrocon project enables the Harvard-Yenching
Library to realize its long-standing goal: the computerization of its entire
catalog in both romanized and vernacular scripts. It also enriches the
OCLC WorldCat database and makes these valuable East Asian research materials
readily available to scholars and researchers around the world.
Pinyin Task Force Report - by Sarah Elman, University
of California, Los Angeles
The OCLC CJK Users Group Pinyin Task Force was asked by OCLC to review
its first conversion test file in December 2000. A total number of 440
pairs of records (consisting of "before" and "after" images of the bibliographic
records) were sent to us. The records were divided equally among the five
Task Force members. Review findings and comments were sent to OCLC in early
January. OCLC then worked on improving the conversion program based on
our comments. The second test file, consisting of the same set of records,
was sent to us in late January. We completed the second review in early
February.
The second test file showed noticeable improvements, and some of the
problems in the previous test file were successfully corrected. However,
only about 35% of the records were error-free. The following problems remained
unsolved as of the 2nd review:
1. Some Wade-Giles elements did not convert -- This occurred mostly
in
2. Converted Pinyin subject headings in test records do not
match the authority file.
3. Inadvertent conversion (i.e., Wade-Giles data which should
not have been converted to Pinyin) Examples include:
4. The names of single-character Chinese counties have been
incorrectly converted to one word. Examples:
5. Inconsistency in capitalization and word division of names
of jurisdictions and geographic features, such as Sheng, Shi, Xian, Xingzhengqu,
Zhonghua Renmin Gongheguo, etc.
6. Apostrophe is not presented when the first syllable ends with
the letter n and the second begins with the letter g.
7. Gibberish appears after conversion, especially in the 245 field.
8. Another major problem is the "garbage in garbage out" phenomenon.
Many records have typos, spelling errors, and incorrect diacritics, etc.
The inconsistent usage of hyphens in names is among the most challenging
tasks that OCLC and all librarians need to grapple.
OCLC Reports to the CJK Users Group are available from: ftp://ftp.rsch.oclc.org/pub/documentation/cjk_users_group/
Due to time constraints only the first two presentations were given.
Written reports covering the rest of the intended presentations were included
in the written documentation distributed by OCLC at the meeting.
OCLC 21st Century Global Strategy: Focusing on Metadata
Services Roadmaps -- Marty Withrow, Director, Metadata Services Division
Marty Withrow began by recognizing the outgoing committee members for their
service. He outlined the three-year strategic plan that Karl Lo mentioned
earlier. OCLC has made a decision to change its strategic plan to
reflect the changes that libraries and librarians have made from being
custodians of the book to being service-oriented information managers.
WorldCat is going worldwide. Instead of waiting for libraries to bring
their data to Dublin, OCLC is going to use linking technology to go out
to the data. Data means more than books; it will also include images, sound
files, and other data.
The Extended WorldCat will cover four major service areas:
Questions and Answers:
When will we be able to do NACO from within CJK software? We will have
to wait until the integrated software is available.
When will we be able to do CJK in CORC? It won't be until the summer
of 2002 when OCLC will provide Unicode support.
When will we be able to include vernacular in ILL requests? OCLC is
working on the implementation of the ISO protocol for record exchange.
OCLC Metadata Policy and Standards: Focusing on OCLC
Pinyin Conversion -- Glenn Patton, Director, Metadata Policy and Standards
Division)
Glenn Patton's presentation focused on the Pinyin Conversion project.
Details on the OCLC Pinyin Conversion Project are available from <http://www.oclc.org/oclc/pinyin/index.htm>
Meeting participants received a summary sheet from Glen with details
on the conversion process. "OCLC Metadata Standards and Quality Update"
There were 152,000 authority records converted in September 2000.
OCLC is currently working on the final and 5th test of the conversion
software. Almost all problems noted by the Pinyin Conversion Task Force
have been dealt with during the last round of conversion software tweaking.
The only issue that OCLC can't deal with is the "garbage in garbage out"
problem in the original Wade-Giles records. Missing diacritics in
Wade-Giles words will generate incorrect Pinyin words.
First round of conversion in OCLC will be CONSER serial records. Next
will be the WorldCat Chinese records, beginning with the newest records
first. Then OCLC will work on other records that contain Wade-Giles. The
emphasis is on fixing access points first. All conversion activity is targeted
to be completed by Fall 2001.
Local file conversion will begin once the WorldCat software is done
because the same software will be used. There are six options but the three
basic choices remain:
OCLC wants its users to know that OCLC has become much more proactive
about doing maintenance to the database and correcting errors in bibliographic
records.
Questions and Answers.
The meeting adjourned at 12:30.
Sharon Domier, Recorder
OCLC CJK Users Group 2001 Annual Meeting
Saturday, March 24, 2001
Holiday Inn Chicago City Center
Room Lasalle 1
300 East Ohio Street
Chicago, IL
(Continental Breakfast provided)
Agenda
8:50- 8:55 a.m. Introduction (Wen-ling
Liu, Indiana University)
8:55- 9:00 a.m. 10th Anniversary of the Users Group
(Karl Lo, U.C. San Diego)
9:00- 9:40 a.m. Unicode and East Asian Ideographs
(John Jenkins, Unicode Consortium)
9:40- 9:45 a.m. Update on the Harrvard-Yenching Library's
Retrospective Conversion
(James Lin, Harvard)
9:45-10:00 a.m. Pinyin Task Force Report (Sarah
Elman, U.C.L.A.)
10:00-10:05 a.m. Questions and Answers
Minutes:
Recorder: Sharon Domier
Photographer: Abraham Yu
Maintained the OCLC CJK
Users Group Web site and managed the electronic voting process.
Expanded the membership, kept track of member moves, and generated
a list of members' email addresses.
Nominated 11 members to stand for office. All were thanked for their
willingness to put their names up for the vote.
In the next three years, we will extend the present OCLC library
cooperative
of 38,000 institutions in 76 countries into a truly global, digital community.
This will involve developing new Web based services, implementing a new
technological platform, and, most important, reaffirming a commitment to
library cooperation.
"Extending the OCLC cooperative: a three year strategy." http://www.oclc.org/strategy/
Available also as pdf file. (Accessed 15 April 2001).
1. Unicode text is simple to parse and process
2. Unicode text is not stateful (if you lose part of the text the rest
can still be interpreted)
3. Unicode encodes characters not glyphs
4. Unicode defines plain text (does not deal with rich text)
5. Unicode uses logical order (e.g. Bengali, Arabic)
6. Unicode unifies characters from different scripts (e.g. Chinese,
Japanese, Korean)
7. Unicode uses dynamic composition (so you can get to ideographs by
using description sequences)
8. Unicode uses equivalent sequences (e +' is equivalent to é)
9. Unicode is convertible (it is a superset of most character sets
in current use - but not EACC or CCCII) e.g. convert Shift-JIS to Unicode
to GB. Unicode will probably tackle EACC in the future.
10. There are a variety of benefits to Unicode use
We hope that OCLC will devote more effort to the authority control
aspect of the conversion program. Many libraries will depend on OCLC to
deliver authority records based on the converted bibliographic records
so it is important to ensure that access points in converted bibliographic
records are accurate and conform to the national authority file.
Metadata reflects OCLC's move from providing support for traditional cataloging
records to comprehensive metadata creation. The metadata format structures
will expand to include structures appropriate for materials held in museums,
art institutes, and other institutions. The new metadata structures
will be based on standards and can be integrated with local systems and
materials vendors. It will provide multiformat, multilanguage, multistyle,
comprehensive coverage. This is a change whereby OCLC is going to reach
beyond its own borders to find data. If records are not in OCLC, then we
will obtain records/data from other countries. OCLC will provide a variety
of services such as metadata maintenance (authorities, bib notification),
metalinking (tables of contents, links to vendors or publishers), Just-in-time
metadata (like PromptCat), and contract work (TechPro). At the same time,
OCLC will work to eliminate separate products (CatME, CJK, Passport) and
move to a one-stop shopping browser for metadata creation/retrieval.
Phase One will see an enhanced CatME and a merger of CORC and CatExpress.
Phase Two will see web-based ILL software and the elimination of Passport.
Phase Three will pull together all the metadata softwares and functions
into one package.
This involves creating a digital vault and making it available worldwide.
Examples would include harvesting and archiving websites.
3. Discovery and navigation
An example of this is acting as a Google Library Partner "find it at my
library." People could set up profiles that integrate local library holdings
and purchase options (Amazon etc.) into search engines. Another example
would be virtual reference services (24/7 Ask-a-Reference-Librarian service).
4. Service fulfillment
An example of this would be enhanced interlibrary loan and links between
interlibrary loan and bookstores. Another example might be enhanced profiling,
where content is delivered in your language.
Marty Withrow's final message to the group was "Weave Libraries into
the Web and the Web into Libraries."
Details on the Pinyin Conversion timeline and procedures are available
from: <http://lcweb.loc.gov/catdir/pinyin/>
Details on Local Catalog Conversion through OCLC are available from:
<http://www.oclc.org/oclc/pinyin/1localcat.htm>
Generally speaking, the conversion went well, but some Wade-Giles headings
were not retained in the conversion process because the cross-references
were not called for (name-title cross references, subordinate references).
Participants at ALA Midwinter called for those headings to be retained
for automated authority control. Approximately 25,000 headings have been
retrieved and added back to the records as cross references. They are coded
as |w nnea (earlier established form, may not display). The Library of
Congress will make all the converted headings available through its distribution.
See the OCLC website for prices and details.
University of Massachusetts Amherst