BlockFileis an interface for accessing files full of identically sized "blocks" of data. This interface is like
IntFile, in that they both describe the abstract methods for accessing their respective files, and both have two concrete implementations: one for mapped file access, and the other for normal I/O access. In the case of
IntFile, the concrete implementations for
BlockFileshare a significant portion of their code. This is shared between the two classes by having them both extend an abstract class called
AbstractBlockFile, which in turn implements
IOBlockFiledo not implement
Redundant InterfacesThis arrangement makes
AbstractBlockFilecan be used in every context where
BlockFilemight have been. However, using an interface in this way is a well established technique in Kowari. It allows for the rapid development of alternative implementations as the need arises. It also permits an object that fulfils a number of roles to implement more than one interface. These advantages both help the codebase stay flexible for ongoing development.
Finally, these interfaces, redundant or not, provide a consistent way of documenting the method of using a class (its platonic "interface", as opposed to the grammatical construct in the Java language).
RecordsThe point of a
BlockFileis to store a series of records of identical structure in a file. Each record will be exactly the same size. The data structure within that record is left up to the caller. Kowari uses
BlockFiles to store numerous data types, including strings (of defined lengths), numbers, tree nodes, and linked lists.
Files which hold records like this are essential to scalable systems, as data from anywhere in the system can be found and modified rapidly. Variable length structures can be more space efficient, but require that a file be traversed before data can be found, and modification is often impossible. Variable length structures can be indexed for rapid retrieval, but modification is often impractical.
As a side note, the expected replacement for the Kowari file system will use variable length structures, but with inherent indexing to make traversal faster than it currently is. The problem with modifying data will still exist, but this will be managed in a linear annotation queue, which will be merged back in to the main data silently in the background. The result should demonstrate much faster read and write operations, particularly for terabyte scale files.
getNrBlocks()are used to set and get the length of the block file. Since the file may only be viewed as a collection of blocks, then its length should only be measured in units of blocks.
Like all the other file-manipulating classes,
clear()is used to remove all data from the file, and set it to zero length.
force()is used to block the process until all data is written to disk.
unmap()attempts to remove any existing file mappings,
close()closes the file, and
delete()closes the file and removes it from the disk. See
IntFilefor a more detailed explanation.
The remaining methods are all for manipulating the blocks within this file.
Blockclass was created to abstract away access to the records stored within a
BlockFile. The block is always tied back to its location on disk, so any operations on that block will always be to the specific location on disk. This is important for memory mapped files, as it means that a write to a block will be immediately reflected on disk (though disk I/O can be delayed by the operating system) without performing any explicit write operations with the block.
BlockFileare also expected to manage
Blocks through an object pool. This was particularly important in earlier JVMs, where object allocation and garbage collection were expensive operations. Fortunately, recent versions of Java no longer suffer these effects, and an object pool can even slow a process down.
The other reason to use an object pool for blocks is to prevent the need to re-read any blocks still in memory. This is not an issue for memory mapped files, but if the blocks are accessed with read/write operations then they need to be initialized by reading from disk. The object pool for blocks will re-use a block associated with a particular file address whenever one is available. If one is not available, then the standard pool technique of re-initializing an existing object is used. With Java 1.5, it is this last operation which may be discarded.
As a consequence of the above, the remaining
BlockFilemethods all accept an
ObjectPoolas a parameter. This may appear unnecessary, since a pool would usually be associated with an instance of the
BlockFileclass, but in this case there are several pools which may be used, depending on the needs of the calling code.
allocateBlock()is used to get a
Blockfor a particular file position, defined by a blockId parameter, but not to read the file for any existing data. This is for new positions in the file where the data was previously unused. There is no reason to read the data at this point, and the read operation will be time consuming. Writing the resulting block will put its contents into the file at the given block ID.
readBlockis similar to
allocateBlock, but the block is already expected to have data in it. The
Blockobject is allocated and associated with the appropriate point in the file, and data is read into it.
Since read operations are never explicit in memory mapped files, the
readBlock()methods are identical in the
Any blocks obtained from
readBlock()should be passed to
releaseBlock()once they have been finished with. This allows the block to be put back into the object pool. While not strictly necessary, it should improve performance.
writeBlock()writes the contents of a
Block's buffer to that
Block's location on disk. This performs a
IOBlockFileand is an empty operation (a no-op) in
copyBlock()method sets a new block ID on a given
Block. This now means that the same data is set for a new block. On
IOBlockFilethis is a simple operation, as the buffer remains unchanged. However, on
MappedBlockFilethe contents of the first block have to be copied into the new block's position. This is unusual, since
MappedBlockFileusually does less work in most of its methods.
writeBlocksimply writes a block's buffer back to disk. This is a no-op in
freeBlock()should be removed from this class, as they are operations which are designed for
ManagedBlockFile. This class provides a layer of abstraction over
BlockFileand will be described later. The only implementations of these methods is in
Block File TypesThe
BlockFileinterface also includes a static inner class which defines an enumeration of the types of
BlockFile. This is the standard enumeration pattern used before Java 1.5.
The defined types are:
- The file is memory mapped. Reading and writing are implicit, and the read/write methods perform no function.
- The file is accessed with the standard read/write mechanism.
- The file is accessed by memory mapping if the operating system supports a 64 bit address space. Otherwise file access is explicit.
- This is set to AUTO, unless the value is overwritten at the command line. The override is performed by defining tucana.xa.defaultIOType as mapped, explicit or auto.
This enumeration class is used by the factory methods in
AbstractBlockFileto determine which concrete implementation should be instantiated.