Count-Key-Data Disks

Date: Fri, 24 Jun 1994 19:22:20 -0700 (PDT)
From: Dick Wilmot 
Subject: Re: Disk Storage: EMC vs. IBM vs. StorageTek (long)
To: "Rodney D. Van Meter" 
In-Reply-To: <>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

                Dick Wilmot
                Editor, Independent RAID Report
                (510) 938-7425

On Thu, 23 Jun 1994, Rodney D. Van Meter wrote:

> As always, a thoughtful and informative assessment, Dick. It's people
> like you who make reading this newsgroup worthwhile.
> For those of us who don't live in the mainframe world, care to clue us
> in on how CKD disks work?

I don't want to bore non-mainframers with tedious details so here's some
email which can be posted if you think useful. Anyway count-key-data
(CKD) disks format each track as you write a new file on that track (all
files have at least one track). The disk sectors (called blocks or
records) are as big as you say they are. From maybe 20 bytes up to a full
track. If you write tiny blocks then most of the track will be taken up
with overhead of start and stop (gap) areas and headers. If you write big
blocks then you'll get much better disk space utilization (more data less
overhead) and much higher bandwidth to/from disk.

CKD disks like SCSI and IDE have sector (count) ID fields and data
fields. CKD disks can also have a third kind of field between ID and data
and that is called a key field. A CKD disk can search the disk for key fields
with particular values and interrupt the host when one is found. CKD
disks can also have different sized data fields on the same track. So a
track might have a 100 byte data block followed by a 9,000 byte data
block followed by a 400 byte data block (each block being preceded by its
count [ID] field).

A host channel program running in the specialized channel I/O processor
can tell a CKD disk to seek to a particular cylinder and track and then
can specify that it should wait until a specified block is under the read
head. The host channel program can then - after being notified that block
#7 say is coming under the disk head - can issue a rewrite command. This
means that the gap between ID (used to tell which data block follows and
what size it is) and the data block must be big enough to allow the
signal 65 microseconds to get to host and back. If the gap isn't big
enough and the signal takes longer than the gap then it will be too late
to rewrite that block and the disk will have to wait a full revolution
before completing the rewrite. This can be disastrous for performance.
For sort and media restore utilities (which are sold very much on benchmarks)
you want to make sure your disks don't get caught on these small CKD
technicalities. This also causes a problem for cached CKD disk
controllers which cannot accept a rewrite command for a block not in
cache because they don't know that the block is as big as the host
channel program is sending. A host might be updating an 8,000 byte block
but on disk it is only 7,000 bytes. The host should get an I/O error.
What is the controller to do? Delay the action until it can fetch the
block into cache or lie and say it has done the rewrite command and pray
that the host did not make an error? Or (what IBM has finally done) cache
a table of block lengths which are usually all the same over large
extents of disk (then the controller can be assured that the host has not
made an error in the number and size of blocks beign replaced).

IBM could probably replace the CKD (and ECKD successor) protocols for one
with all same length records and no key fields and they have been moving
their file systems that way but ever so gradually. I think they are in no
hurry because they want to keep a barrier of entry to smaller vendors who
can't marshall the CKD expertise. Worse than competition is competition
from small hungry sharks

Mainframe File Systems

> Pointers to discussion of the file system structure on mainframes
> would be also be appreciated.
>               --Rod

IBM's file systems are a bewildering array. There are many different file
types: VSAM, SAM-E, PDS, PDS-E, OSAM, OAM, DIV and more. Each different file
type has its own drivers and, I think still, its own IBM development group.
There is no concept of disk partitioning so different file types are
intermingled in any fashion wanted on each disk.

VSAM is brand new. It was introduced in 1974 and was to have taken over
the IBM world and it did take over much of the production data. It comes
in several flavors. Keyed which is a B*-tree organization with front and
rear key compression in the indexes. Pretty slick. The unkeyed version
(entry sequenced) is used by the IMS database software (which can also
use the OSAM access method developed exclusively by the IMS development group)
as well as by some applications. Linear VSAM is virtual storage image and
is used for paging files as well as by the DB2 database software. VSAM
files cannot contain source or executable programs.

SAM-E (sequential access method extended) is sequential but I think you
can still access these files using the old BDAM direct access method to
get directly to blocks in the middle of the file.

PDS is partitioned data set and is where you store your source and
executable programs unless you buy a library package that stores them in
asome other way.

OAM is the object access method for storing large things like document

DIV is data in virtual or what other folk call file mapping (into virtual
memory) and these files are actually stored in VSAM linear files when not
in use.

There are classes for the different access methods taught by IBM and others
(e.g. Amdahl, Hitachi) and you can get the technical manuals. VSAM
classes are maybe three weeks of intensive study after you know IBM's
batch Job Control Language. JCL is a very arcane language and is not much
different to what it looked like in 1965.

If I didn't know this stuff then I wouldn't start studying it now. It
will likely be of decreasing importance and there will be too many
mainframe programmers chasing too few jobs.