This topic is also discussed frequently in comp.os.research.
The Network File System, originally developed by Sun Microsystems and now pretty standard in the Unix world, and clients exist for PC, Mac, VMS, and other non-Unix OSes. V2, the common version, supports single files only up to 2^32 (4GB) bytes. I'm not sure if there are any limits to a file system size under NFS, other than those imposed by the client and server OSes (SHMO).
NFS is defined in RFC 1094. V3 is now RFC 1813.
There is at least one newsgroup devoted specifically to NFS: comp.protocols.nfs.
NFS V3 supports 64-bit files and write caching.
The first implementation was from Digital with DEC OSF/1 V3.0 for Alpha AXP. Silicon Graphics supports it on IRIX 5.3. Cray will support it on UNICOS 9. I don't know about other vendors but I have heard rumours that the releases coming in the second half of 1995 will support it.
Further information on NFS V3 can be found from
Solaris 2.5, available Nov. 95, is reported to have V3 support. Network Appliances have it as of 3.0, Sept. 95. (firstname.lastname@example.org (Guy Harris), 95/10/6)
The Andrew File System (SHMO). Allows naming of files worldwide as if they were a locally-mounted FS (from cooperating clients, of course).
There's an "alt" group for AFS - "alt.filesystems.afs". Available commercially from Transarc.
Another remote file system protocol that supports large files. I don't know anything about it, or if any implementations really exist yet.
Further Information: %z InProceedings %K hpdb:Rosenblum91 %s email@example.com (Thu Oct 17 11:12:07 1991) %A Mendel Rosenblum %A John K. Ousterhout %y UCBCS. %T The design and implementation of a log-structured file system %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 1 15 %x This paper presents a new technique for disk storage management %x called a log-structured file system. A log-structured file system %x writes all modifications to disk sequentially in a log-like %x structure, thereby speeding up both file writing and crash %x recovery. The log is the only structure on disk; it contains %x indexing information so that files can be read back from the log %x efficiently. In order to maintain large free areas on disk for %x fast writing, we divide the log into segments and use a segment %x cleaner to compress the live information from heavily fragmented %x segments. We present a series of simulations that demonstrate the %x efficiency of a simple cleaning policy based on cost and benefit. %x We have implemented a prototype log-structured file system called %x Sprite LFS; it outperforms current Unix file systems by an order of %x magnitude for small-file writes while matching or exceeding Unix %x performance for reads and large writes. Even when the overhead for %x cleaning is included, Sprite LFS can use 70% of the disk bandwidth %x for writing, whereas Unix file systems typically can use only %x 5--10%. (firstname.lastname@example.org)
Also, these papers:
Ousterhout and Douglis, "Beating the I/O Bottleneck: A Case for Log- structured File Systems", Operating Systems Review, No. 1, Vol. 23, pp. 11-27, 1989, also available as Technical Report UCB/CSD 88/467.
Rosenblum and Ousterhout, "The Design and Implementation of a Log- Structured File System", ACM SIGOPS Operating Systems Review, No. 5, Vol. 25, 1991.
Seltzer, "File System Performance and Transaction Support", PhD Thesis, University of California, Berkeley, 1992, also available as Technical Report UCB/ERL M92.
Seltzer, Bostic, McKusick and Staelin, "An Implementation of a Log- Structured File System for UNIX", Proc. of the Winter 1993 USENIX Conf., pp. 315-331, 1993.
listed from the man page for mount_lfs under FreeBSD-2.1.5. (rdv, 97/1/17)
A brief description of mainframe file systems (as well as CKD (Count, Key, Data) disks) by Dick Wilmot is available.
This discussion comes up occassionally on comp.arch and comp.os.research. I don't know which newsgroups/mailing lists the PIO (Parallel I/O) people hang out in, but it doesn't seem to be here. They show up occassionally in comp.sys.super and comp.parallel. They do have their own conferences, though.
The important work seems to be going on with the supercomputing gang -- LLNL, CMU, Caltech, UIUC, Dartmouth, ORNL, SNL, etc. Work is also being done by the parallel database community, including vendors such as Teradata.
A paper presented at the ACM International Supercomputing Conference in 1993 showed what to me seemed to be pretty appalling performance for reading data and distributing it to multiple processors on an Intel Delta supercomputer (sorry I don't have the reference in front of me). (rdv, 94/8/12) The paper is old, now, and the Intel guys say they have improved performance to up to 130 MB/sec. on the new Paragon using their Parallel File System (PFS).
There is an excellent web site on parallel I/O at Dartmouth:
There is also a mailing list housed at Dartmouth, email@example.com.
The annual conference is I/O in Parallel and Distributed Systems
(IOPADS); 1997's is co-located with Supercomputing '97 in San Jose,
Nov. 17. Papers are due March 25, 1997. See
I seem to recall that NT supports 64-bit file systems for its own native file systems? Anybody know for sure (SHMO)? (rdv, 94/8/24)
From *Inside the Windows NT(TM) File System*, by Helen Custer:
"NTFS allocates clusters and uses 64 bits to number them, which results in a possible 2^64 clusters, each up to 4KB. Each file can be of virtually infinite size, that is, 2^64 bytes long."
"Clusters" can be between 512 and 4K bytes.
The Win32 API supports 64-bit file sizes, albeit in a cheesy fashion reminiscent of V6 UNIX - no 64-bit integral types used, just pairs of 32-bit integral types. (firstname.lastname@example.org (Guy Harris), 95/10/6)
There is now an industry group working on standardizing an API for
files larger than 2 GB (the max size normally supported on most Unix
systems). More info as I get it. The WWW-enabled can have a look at
www.sas.com:80 and see the various
proposals on the table.
$B%*%Z%l!<%F%#%s%0%7%9%F%`$,#2%.%,%P%$%H0J>e$N%U%!%$%k$+$^$?$O#2%.%,%P%$(B $B%H0J>e$N%U%!%$%k%7%9%F%`!J%Q!<%F%#%C%7%g%s!K$r%5%]!<%H$7$F$$$k$+4V0c$$(B $B$d$9$$$G$9!#0J2<$NI=$K$O$=$l$iN>J}$,F~$C$F$$$^$9!#$3$N>pJs$O$[$H$s$I(B email@example.com (Benjamin Z. Goldsteen) $B$H(B Ed Hamrick (EdHamrick@aol.com)$B$H(B Peter Poorman (firstname.lastname@example.org) $B$+$i$G$9!#(B Note that it is VERY easy to confuse whether an OS supports _files_ larger than 2 GB or _file systems_ larger than 2 GB. My table lists some of both (thanks to email@example.com (Benjamin Z. Goldsteen), Ed Hamrick (EdHamrick@aol.com) and Peter Poorman (firstname.lastname@example.org) for much of this information).
$B#6#4%S%C%H$N@0?t$r;HMQ$9$k%7%9%F%`$G!"#6#4%S%C%HD9$N%U%!%$%k$r%5%]!<%H(B $B$N$O$d$j$d$9$$$G$9$,!"#3#2%S%C%H$N@0?t$N%7%9%F%`$G$O!"$b$C$HJ#;($G$9!#(B $B$[$H$s$I$N#3#2%S%C%H%7%9%F%`$G!"%*%Z%l!<%F%#%s%0%7%9%F%`%+!<%M%k$N%U%!(B $B%$%k%*%C%U%;%C%H!J0lHVBg;v$J$H$3$m$O!"(BVFS$B%l!<%"!<$G$"$k!K$O#3#2%S%C%H(B $B$G$9$+$i!"#2%.%,%P%$%H0J>e$O$G$-$J$$$H$$$&$3$H$G$9!#(B
It is straightforward for systems with 64-bit integers to support 64-bit files; for systems with 32-bit integers it is more complex. On most 32-bit systems the offsets passed around inside the kernel (most importantly, at the VFS layer) the file offsets and sizes tend to be passed as 32-bit (signed) integers, meaning no files >2^31.
$B$[$H$s$I$N%7%9%F%`!J(BSunOS$B$d(B Linux$B$J$I!K$G!"(Blseek$B$H$$$&%7%9%F%`%3!<%k4X(B $B?t$N0z?t$N%?%$%W$O(Boff_t$B$G$9!#$3$N(Boff_t$B$NDj5A$O(Btypedef long off_t;$B$G$"$k!#(B On most systems, the argument to lseek is of type off_t, which (on SunOS and Linux, and plausibly on OSF/1 and others) is declared in a header file as "typedef long off_t;".
$B%/%i%$%(%s%H$O%"%/%;%C%9$G$-$k$?$a$K!";0$D$N$3$H$OI,MW$G$9!#%/%i%$%(%s(B $B%HB&$N%m%+%k%U%!%$%k%7%9%F%`$HE,Ev$J%M%C%H%o!<%/%W%m%H%3%k$H%5!<%P! $BBg%U%!%$%k%5%]!<%H$7$F$"$k%7%9%F%`$G!"%W%m%0%i!<%^!<$^$?$O%f!<%6!<$K$O(B $BA4ItF)L@$G$O$J$$!#%/%l%$$N%f%K%3%9(B(UniCOS)$B$H%G%#%8%?%k$N(BOSF/1$B$OF)L@$G(B $B$9$,!"%3%s%t%'%C%/%9$N(BConvexOS$B$OF)L@$7$J$$!#Fs$D$N%7%9%F%`%3!<%k$,$"$j(B $B$^$9!#(Blseek$B$N0z?t$O#3#2%S%C%H$H(Blseek64$B$N0z?t$O#6#4%S%C%H$G$9!#$G!"%W%m(B $B%0%i%`$O%P!<%8%c%s%"%C%W$7$J$1$l$P$$$1$J$$!#!J%3%s%t%'%C%/%9$N%U%)!<%H(B $B%i%s$OF)L@$G$9!#!K(B Even for the systems that _do_ support large files, not all are programmer or user-transparent for supporting large files. UniCOS is, OSF/1 is, ConvexOS is not (there are two system calls, lseek and lseek64, with 32-bit and 64-bit file offsets, respectively, though the Fortran interface is transparent).
This brings up the related issues. A complete large files implementation needs not only the system calls, but also the stdio library and the runtime libraries for the languages (Fortran, Cobol,...). Further, system utilities (sed, dd, etcetera) need to be capable of dealing with large files.
(It has been pointed out that the GNU C compiler runs on most of these machines, so it is possible to use "long long" as a 64-bit int on them, but what matters for file systems is the system compiler.)
$B0J2<$NI=$O4JC1$9$.$k$G$9$,!"9M$($i$l$k$b$N$@$H;W$$$^$9!#(B Here's the start of a table on these. Really such a simple table can't do the problem justice, but it'll give you an idea. Keep in mind that many of these systems support many file system types; I've listed only the most interesting so far from this point of view. I'd like to flesh it out more completely, though.
A slightly more detailed description of certain implementations is available with the WWW version.
In addition, the HPSS (see above) supports large files, as does Unitree (though the Unitree interface to them is limited).
(info about non-Unix large FSes also welcome; SHMO)
$B%G%#%8%?%k$N(BOpenVMS$B!J2?$G$b$N%P!<%8%c%s!K$O(BRMS$B%$%s%?!<%U%'!<%9$G#2(BTB$B$^(B $B$G$N%U%!%$%k%5%$%:$r$G$-$^$9!#$G$b!"#C8@8l$N%i%s%?%$%`%i%$%V%l%j!<$G$^(B $B$@#2%.%,%P%$%H0J2<$N8BEY$G$9!#(B OpenVMS (any version) supports 2TB files (32-bit unsigned block number, 9-bit offset) through its RMS interface (still limited to 2GB through the C run-time library), but file systems are limited to ~7GB (as of Open AXP 1.5 and OpenVMS VAX 6.0 the max volume size has been bumped to 1 TB). (from a friend, rdv, 94/8/26, and Rod Widdowson, Filesystems group, OpenVMS engineering, Scotland).
email me at email@example.com
Copyright 1996 Rod Van Meter