External Methods - Files Two type of files Sequential Access Records are read/written in order from beginning to end Direct (Random) Access Records are accessed directly, as in an array File Organization A file is a collection of BLOCKS A block is a collection of RECORDS A record is a collection of FIELDS File Access All input/output is done at block level, called BLOCK ACCESS Block size is determined by configuration of: hardware operating system Sorting Data A sort requires data to be in memory This is not possible for large data files Need to use a modified Merge Sort Modified Merge Sort Phase I: 1. Read a block into an array from File1 2. Sort the array 3. Write the array to File2 4. Repeat Steps 1-3 until EOF 5. File2 now contains SORTED RUNS Phase II: Merge pairs of sorted runs to form larger sorted runs Repeat until only one sorted run remains Problem: Run size may eventually exceed memory capacity Solution: Input and Output must be BUFFERED for each pair of runs Buffering Input and Output for a Pair of Runs 1. Create 3 internal arrays: Input1, Input2, Output 2. Read a block from Run1 into Input1 3. Read a block from Run2 into Input2 4. Merge Sort Input1 and Input2 into Output 5. If Input1 or Input2 become exhausted, read another block from the appropriate run 6. If Output becomes full, write contents to the external sort file External Methods - Tables Searching Simple binary search 1. Recursively split file segment in half (initially, the entire file) 2. Read the middle block of the file segment into an array 3. Determine if the search key is in the block by comparing it to the first and last record key in the block 4. Search the block for the search key Inserting and Deleting Problem: For large files, this could be problematic since it may require shifting large amounts of data Solution: Create an index file The index file is smaller, and therefore less costly to keep in sorted order Insertion into the data file can now be in any convenient location, since the data doesn't need to be sorted Layout of an Index File Each record in the file contains two parts: 1. The key value matching the actual record in the data file 2. A number "pointing to" block number in the data file containing the record Insertion and Deletion Revisited Deletion: Fill the record in the data file with spaces, and remove the pointer record from the index file Keep track of deleted records in another external file Insertion: Use one of the records marked as deleted, or place it at the end of the file Hashing An index file may become too large to manage insertions and deletions Use the hashing schemes from Chapter 12 on the index file