Read a File and Store in a New File in Java
Reading files in Java is the cause for a lot of confusion. In that location are multiple ways of accomplishing the same task and information technology'due south oft not clear which file reading method is all-time to use. Something that's quick and dingy for a small-scale example file might not be the best method to employ when you need to read a very large file. Something that worked in an earlier Java version, might not exist the preferred method anymore.
This article aims to be the definitive guide for reading files in Java seven, 8 and 9. I'm going to cover all the ways y'all tin read files in Java. Too often, you lot'll read an commodity that tells you one fashion to read a file, only to discover later there are other ways to do that. I'g actually going to cover 15 different ways to read a file in Coffee. I'm going to encompass reading files in multiple means with the core Java libraries as well equally two tertiary political party libraries.
But that'south not all – what skilful is knowing how to do something in multiple means if you don't know which way is best for your state of affairs?
I also put each of these methods to a real functioning test and certificate the results. That way, you will have some hard data to know the performance metrics of each method.
Methodology
JDK Versions
Java code samples don't alive in isolation, especially when it comes to Java I/O, as the API keeps evolving. All code for this article has been tested on:
- Java SE 7 (jdk1.7.0_80)
- Coffee SE viii (jdk1.8.0_162)
- Java SE 9 (jdk-ix.0.iv)
When in that location is an incompatibility, information technology will be stated in that section. Otherwise, the code works unaltered for dissimilar Coffee versions. The main incompatibility is the use of lambda expressions which was introduced in Java 8.
Java File Reading Libraries
There are multiple means of reading from files in Java. This commodity aims to be a comprehensive collection of all the different methods. I will cover:
- java.io.FileReader.read()
- coffee.io.BufferedReader.readLine()
- java.io.FileInputStream.read()
- java.io.BufferedInputStream.read()
- java.nio.file.Files.readAllBytes()
- java.nio.file.Files.readAllLines()
- coffee.nio.file.Files.lines()
- coffee.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Commons
- com.google.common.io.Files.readLines() – Google Guava
Closing File Resources
Prior to JDK7, when opening a file in Coffee, all file resources would demand to be manually closed using a try-catch-finally cake. JDK7 introduced the effort-with-resources statement, which simplifies the process of closing streams. You no longer need to write explicit code to close streams because the JVM will automatically shut the stream for you, whether an exception occurred or not. All examples used in this commodity apply the endeavour-with-resources argument for importing, loading, parsing and closing files.
File Location
All examples will read test files from C:\temp.
Encoding
Character encoding is non explicitly saved with text files so Java makes assumptions almost the encoding when reading files. Normally, the assumption is correct but sometimes you desire to be explicit when instructing your programs to read from files. When encoding isn't correct, yous'll see funny characters appear when reading files.
All examples for reading text files use two encoding variations:
Default system encoding where no encoding is specified and explicitly setting the encoding to UTF-viii.
Download Code
All code files are available from Github.
Code Quality and Code Encapsulation
There is a difference between writing code for your personal or work project and writing code to explain and teach concepts.
If I was writing this lawmaking for my own project, I would apply proper object-oriented principles like encapsulation, brainchild, polymorphism, etc. But I wanted to make each instance stand alone and easily understood, which meant that some of the code has been copied from ane instance to the next. I did this on purpose because I didn't want the reader to accept to effigy out all the encapsulation and object structures I then cleverly created. That would take abroad from the examples.
For the aforementioned reason, I chose Not to write these case with a unit testing framework like JUnit or TestNG because that's not the purpose of this commodity. That would add another library for the reader to understand that has zero to do with reading files in Java. That's why all the case are written inline within the principal method, without actress methods or classes.
My chief purpose is to brand the examples equally easy to understand as possible and I believe that having extra unit testing and encapsulation code will non help with this. That doesn't hateful that'south how I would encourage you to write your own personal code. Information technology'due south but the mode I chose to write the examples in this article to make them easier to understand.
Exception Handling
All examples declare any checked exceptions in the throwing method annunciation.
The purpose of this article is to show all the different ways to read from files in Coffee – it's not meant to bear witness how to handle exceptions, which will be very specific to your state of affairs.
And then instead of creating unhelpful try catch blocks that merely print exception stack traces and clutter up the code, all case will declare any checked exception in the calling method. This will make the code cleaner and easier to understand without sacrificing whatsoever functionality.
Futurity Updates
As Java file reading evolves, I will be updating this commodity with any required changes.
File Reading Methods
I organized the file reading methods into iii groups:
- Classic I/O classes that have been part of Coffee since before JDK 1.vii. This includes the java.io and java.util packages.
- New Java I/O classes that take been part of Java since JDK1.7. This covers the java.nio.file.Files form.
- Third party I/O classes from the Apache Commons and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one grapheme at a fourth dimension, without whatsoever buffering. It's meant for reading text files. It uses the default character encoding on your system, so I take provided examples for both the default case, besides equally specifying the encoding explicitly.
1
2
iii
4
5
six
7
viii
nine
10
11
12
13
14
15
16
17
18
xix
import java.io.FileReader ;
import coffee.io.IOException ;public form ReadFile_FileReader_Read {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;try ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;//brandish one graphic symbol at a fourth dimension
Organisation.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to set the encoding explicitly on a FileReader so y'all accept to use the parent grade, InputStreamReader and wrap information technology around a FileInputStream:
1
2
3
4
5
6
seven
8
9
10
11
12
13
14
15
16
17
xviii
19
twenty
21
22
import coffee.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public class ReadFile_FileReader_Read_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-eight encoding explicitly
try ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-viii" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - one ) {
singleChar = ( char ) singleCharInt;
Organisation.out.print (singleChar) ; //display one graphic symbol at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an entire line at a time, instead of one character at a time similar FileReader. It'due south meant for reading text files.
1
2
3
iv
v
6
7
8
ix
10
xi
12
13
14
15
16
17
import java.io.BufferedReader ;
import coffee.io.FileReader ;
import java.io.IOException ;public form ReadFile_BufferedReader_ReadLine {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;try ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != nada ) {
System.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar fashion to how nosotros set encoding explicitly for FileReader, nosotros need to create FileInputStream, wrap it inside InputStreamReader with an explicit encoding and pass that to BufferedReader:
i
2
3
4
5
six
vii
viii
9
10
11
12
13
fourteen
fifteen
16
17
18
19
20
21
22
import java.io.BufferedReader ;
import java.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public course ReadFile_BufferedReader_ReadLine_Encoding {
public static void main( String [ ] args) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-8 encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-8" ) ;endeavor ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
System.out.println (line) ;
}
}
}
}
Classic I/O – Reading Bytes
1) FileInputStream
FileInputStream reads in ane byte at a fourth dimension, without any buffering. While it's meant for reading binary files such as images or audio files, it can notwithstanding be used to read text file. It's similar to reading with FileReader in that you're reading one character at a time as an integer and you demand to cast that int to a char to meet the ASCII value.
By default, it uses the default character encoding on your system, so I accept provided examples for both the default instance, as well as specifying the encoding explicitly.
1
2
3
iv
5
6
7
8
9
ten
11
12
13
14
15
sixteen
17
18
nineteen
xx
21
import java.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import coffee.io.IOException ;public class ReadFile_FileInputStream_Read {
public static void main( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organisation.out.print (singleChar) ;
}
}
}
}
ii) BufferedInputStream
BufferedInputStream reads a set of bytes all at in one case into an internal byte assortment buffer. The buffer size can be set up explicitly or use the default, which is what we'll demonstrate in our example. The default buffer size appears to be 8KB but I accept not explicitly verified this. All operation tests used the default buffer size so information technology will automatically re-size the buffer when it needs to.
i
2
3
4
five
6
7
8
9
ten
11
12
xiii
14
fifteen
16
17
18
nineteen
20
21
22
import java.io.BufferedInputStream ;
import java.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_BufferedInputStream_Read {
public static void primary( Cord [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;endeavour ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organisation.out.print (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files course is office of the new Java I/O classes introduced in jdk1.7. It merely has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.eight so this instance volition not work in Java 7.
1
2
3
4
v
6
7
8
9
10
eleven
12
13
14
fifteen
16
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;
import java.util.Listing ;public class ReadFile_Files_ReadAllLines {
public static void chief( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;Listing fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( Cord line : fileLinesList) {
System.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
1
two
3
4
5
6
7
8
9
10
11
12
13
xiv
xv
16
17
eighteen
19
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import coffee.nio.file.Files ;
import java.util.Listing ;public class ReadFile_Files_ReadAllLines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//use UTF-8 encoding
Listing fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( String line : fileLinesList) {
Organization.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This code was tested to piece of work in Coffee 8 and 9. Coffee 7 didn't run considering of the lack of support for lambda expressions.
i
2
3
4
5
half-dozen
7
8
9
ten
11
12
13
fourteen
xv
sixteen
17
import java.io.File ;
import coffee.io.IOException ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public course ReadFile_Files_Lines {
public static void primary( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Just like in the previous example, this code was tested and works in Java 8 and ix but not in Java 7.
1
2
three
4
5
half dozen
7
8
9
10
11
12
13
fourteen
15
16
17
xviii
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines_Encoding {
public static void master( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;attempt (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
Organization.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner class was introduced in jdk1.7 and can be used to read from files or from the panel (user input).
1
2
3
4
v
six
7
8
9
ten
11
12
thirteen
14
fifteen
sixteen
17
18
19
import coffee.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public grade ReadFile_Scanner_NextLine {
public static void master( String [ ] pArgs) throws FileNotFoundException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Scanner scanner = new Scanner(file) ) {
String line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
1
2
3
4
5
half-dozen
vii
8
9
x
eleven
12
13
14
15
sixteen
17
18
xix
twenty
import coffee.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public class ReadFile_Scanner_NextLine_Encoding {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//utilise UTF-8 encoding
try (Scanner scanner = new Scanner(file, "UTF-8" ) ) {
String line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Fifty-fifty though the documentation for this method states that "it is not intended for reading in large files" I plant this to exist the accented all-time performing file reading method, even on files as large as 1GB.
1
ii
3
4
v
6
7
viii
9
ten
11
12
13
14
15
sixteen
17
import coffee.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;public form ReadFile_Files_ReadAllBytes {
public static void main( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.print (singleChar) ;
}
}
}
tertiary Political party I/O – Reading Text
Commons – FileUtils.readLines()
Apache Commons IO is an open source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this article considering it can be used instead of the congenital in Coffee libraries. The course we're using is FileUtils.
For this article, version 2.6 was used which is compatible with JDK 1.7+
Annotation that you lot need to explicitly specify the encoding and that method for using the default encoding has been deprecated.
1
two
3
four
5
half-dozen
7
8
9
10
11
12
13
xiv
15
16
17
18
import java.io.File ;
import java.io.IOException ;
import java.util.List ;import org.apache.commons.io.FileUtils ;
public class ReadFile_Commons_FileUtils_ReadLines {
public static void chief( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-viii" ) ;
for ( String line : fileLinesList) {
Organization.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open source library that comes with utility classes for mutual tasks like collections handling, cache management, IO operations, cord processing.
I listed it in this article considering information technology can be used instead of the built in Coffee libraries and I wanted to compare its performance with the Java congenital in libraries.
For this article, version 23.0 was used.
I'm not going to examine all the different ways to read files with Guava, since this article is non meant for that. For a more detailed look at all the different ways to read and write files with Guava, have a look at Baeldung'due south in depth article.
When reading a file, Guava requires that the character encoding be ready explicitly, just similar Apache Commons.
Compatibility notation: This code was tested successfully on Java viii and 9. I couldn't get it to piece of work on Java seven and kept getting "Unsupported major.minor version 52.0" fault. Guava has a separate API dr. for Java seven which uses a slightly unlike version of the Files.readLine() method. I thought I could become it to work simply I kept getting that mistake.
1
two
three
4
5
half dozen
7
viii
9
x
11
12
xiii
14
15
16
17
18
xix
import coffee.io.File ;
import java.io.IOException ;
import java.util.Listing ;import com.google.common.base.Charsets ;
import com.google.common.io.Files ;public class ReadFile_Guava_Files_ReadLines {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Performance Testing
Since at that place are then many ways to read from a file in Java, a natural question is "What file reading method is the best for my state of affairs?" Then I decided to test each of these methods against each other using sample data files of different sizes and timing the results.
Each code sample from this commodity displays the contents of the file to a string and so to the console (System.out). Yet, during the performance tests the System.out line was commented out since it would seriously slow down the performance of each method.
Each performance test measures the fourth dimension it takes to read in the file – line by line, character by character, or byte by byte without displaying anything to the console. I ran each test 5-x times and took the average and so as not to let any outliers influence each test. I also ran the default encoding version of each file reading method – i.due east. I didn't specify the encoding explicitly.
Dev Setup
The dev environs used for these tests:
- Intel Core i7-3615 QM @two.3 GHz, 8GB RAM
- Windows viii x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (four.7.2)
- Coffee SE 9 (jdk-9.0.4)
Information Files
GitHub doesn't allow pushing files larger than 100 MB, so I couldn't find a applied style to shop my large exam files to let others to replicate my tests. So instead of storing them, I'1000 providing the tools I used to generate them then you can create test files that are similar in size to mine. Obviously they won't be the same, but y'all'll generate files that are like in size as I used in my functioning tests.
Random String Generator was used to generate sample text and then I merely re-create-pasted to create larger versions of the file. When the file started getting too large to manage inside a text editor, I had to use the command line to merge multiple text files into a larger text file:
re-create *.txt sample-1GB.txt
I created the following 7 information file sizes to test each file reading method across a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Performance Summary
There were some surprises and some expected results from the performance tests.
Every bit expected, the worst performers were the methods that read in a file grapheme by character or byte by byte. But what surprised me was that the native Java IO libraries outperformed both 3rd party libraries – Apache Commons IO and Google Guava.
What's more – both Google Guava and Apache Eatables IO threw a java.lang.OutOfMemoryError when trying to read in the 1 GB examination file. This also happened with the Files.readAllLines(Path) method but the remaining 7 methods were able to read in all test files, including the 1GB exam file.
The following table summarizes the average time (in milliseconds) each file reading method took to complete. I highlighted the top three methods in dark-green, the boilerplate performing methods in yellow and the worst performing methods in scarlet:
The following nautical chart summarizes the to a higher place table just with the following changes:
I removed java.io.FileInputStream.read() from the chart because its performance was so bad information technology would skew the unabridged chart and yous wouldn't run across the other lines properly
I summarized the information from 1KB to 1MB considering after that, the chart would get too skewed with so many nether performers and also some methods threw a coffee.lang.OutOfMemoryError at 1GB
The Winners
The new Java I/O libraries (coffee.nio) had the best overall winner (java.nio.Files.readAllBytes()) but it was followed closely behind past BufferedReader.readLine() which was also a proven top performer across the board. The other excellent performer was java.nio.Files.lines(Path) which had slightly worse numbers for smaller test files but actually excelled with the larger exam files.
The absolute fastest file reader across all information tests was java.nio.Files.readAllBytes(Path). It was consistently the fastest and fifty-fifty reading a 1GB file simply took about 1 second.
The following chart compares performance for a 100KB test file:
You can see that the lowest times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The following chart compares performance for reading a 10MB file. I didn't bother including the bar for FileInputStream.Read() because the performance was so bad it would skew the entire chart and you couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() really outperforms all other methods and BufferedReader.readLine() is a afar second.
The Losers
As expected, the absolute worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for virtually tests. FileReader.read() was also a poor performer for the aforementioned reason – reading files byte by byte (or character by graphic symbol) instead of with buffers drastically degrades operation.
Both the Apache Commons IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB test file and they were nearly average in performance for the remaining test files.
java.nio.Files.readAllLines() also crashed when trying to read the 1GB test file but information technology performed quite well for smaller file sizes.
Performance Rankings
Here'due south a ranked list of how well each file reading method did, in terms of speed and handling of large files, besides every bit compatibility with dissimilar Java versions.
Rank | File Reading Method |
---|---|
ane | java.nio.file.Files.readAllBytes() |
2 | coffee.io.BufferedFileReader.readLine() |
3 | java.nio.file.Files.lines() |
4 | java.io.BufferedInputStream.read() |
v | java.util.Scanner.nextLine() |
6 | java.nio.file.Files.readAllLines() |
7 | org.apache.commons.io.FileUtils.readLines() |
8 | com.google.common.io.Files.readLines() |
nine | java.io.FileReader.read() |
ten | coffee.io.FileInputStream.Read() |
Determination
I tried to present a comprehensive set of methods for reading files in Java, both text and binary. Nosotros looked at 15 different ways of reading files in Java and we ran performance tests to encounter which methods are the fastest.
The new Java IO library (coffee.nio) proved to be a great performer just then was the archetype BufferedReader.
Source: https://funnelgarden.com/java_read_file/
0 Response to "Read a File and Store in a New File in Java"
Post a Comment