外部排序是一種對大量數據進行排序的方法,當數據量超過內存容量時,可以使用外部排序。外部排序的基本思想是將數據分成多個小塊,對每個小塊進行排序,然后將這些小塊合并成一個有序的大文件。這里是一個簡單的外部排序算法實現:
下面是一個簡單的Java實現:
import java.io.*;
import java.util.Arrays;
public class ExternalSort {
public static void main(String[] args) throws IOException {
// 假設有一個包含大量整數的文件 input.txt
File inputFile = new File("input.txt");
// 輸出文件名
File outputFile = new File("output.txt");
// 外部排序
externalSort(inputFile, outputFile);
}
public static void externalSort(File inputFile, File outputFile) throws IOException {
// 1. 讀取輸入文件,將其分割成多個小塊
List<File> chunkFiles = splitInputFile(inputFile);
// 2. 對每個小塊進行內部排序,并將排序后的小塊寫入到磁盤上的臨時文件中
List<File> sortedChunkFiles = sortChunks(chunkFiles);
// 3. 使用歸并排序的思想,將所有排序后的小塊合并成一個有序的大文件
mergeSortedChunks(sortedChunkFiles, outputFile);
}
private static List<File> splitInputFile(File inputFile) throws IOException {
List<File> chunkFiles = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
List<String> lines = new ArrayList<>();
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
if (lines.size() == 1000) { // 每個小塊的大小為1000行
chunkFiles.add(sortAndSaveChunk(lines));
lines.clear();
}
}
if (!lines.isEmpty()) {
chunkFiles.add(sortAndSaveChunk(lines));
}
}
return chunkFiles;
}
private static File sortAndSaveChunk(List<String> lines) throws IOException {
Collections.sort(lines);
File chunkFile = File.createTempFile("chunk", ".txt");
chunkFile.deleteOnExit(); // 確保程序退出時刪除臨時文件
try (BufferedWriter writer = new BufferedWriter(new FileWriter(chunkFile))) {
for (String line : lines) {
writer.write(line);
writer.newLine();
}
}
return chunkFile;
}
private static List<File> sortChunks(List<File> chunkFiles) throws IOException {
PriorityQueue<FileBuffer> queue = new PriorityQueue<>((a, b) -> Integer.compare(a.size(), b.size()));
for (File chunkFile : chunkFiles) {
FileBuffer buffer = new FileBuffer(chunkFile);
if (buffer.size() > 0) {
queue.add(buffer);
}
}
List<File> sortedChunkFiles = new ArrayList<>();
while (!queue.isEmpty()) {
FileBuffer buffer = queue.poll();
sortedChunkFiles.add(buffer.file);
if (buffer.size() > 0) {
FileBuffer nextBuffer = new FileBuffer(buffer.file);
nextBuffer.readNextLine();
queue.add(nextBuffer);
}
}
return sortedChunkFiles;
}
private static void mergeSortedChunks(List<File> sortedChunkFiles, File outputFile) throws IOException {
PriorityQueue<FileBuffer> queue = new PriorityQueue<>((a, b) -> Integer.compare(a.size(), b.size()));
for (File chunkFile : sortedChunkFiles) {
FileBuffer buffer = new FileBuffer(chunkFile);
if (buffer.size() > 0) {
queue.add(buffer);
}
}
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
while (!queue.isEmpty()) {
FileBuffer buffer = queue.poll();
writer.write(buffer.popNextLine());
writer.newLine();
if (buffer.size() > 0) {
FileBuffer nextBuffer = new FileBuffer(buffer.file);
nextBuffer.readNextLine();
queue.add(nextBuffer);
}
}
}
}
static class FileBuffer implements Comparable<FileBuffer> {
BufferedReader reader;
String line;
File file;
int size;
public FileBuffer(File file) throws IOException {
this.file = file;
this.reader = new BufferedReader(new FileReader(file));
this.size = (int) file.length();
this.line = reader.readLine();
}
public int size() {
return size;
}
public String popNextLine() throws IOException {
String temp = line;
line = reader.readLine();
return temp;
}
@Override
public int compareTo(FileBuffer other) {
return Integer.compare(this.size(), other.size());
}
}
}
這個示例中,我們首先將輸入文件分割成多個小塊,然后對每個小塊進行排序并保存到臨時文件中。接下來,我們使用優先隊列(最小堆)將所有排序后的小塊合并成一個有序的大文件。