My program must read text files - line by line. Files in UTF-8. I am not sure that files are correct - can contain unprintable characters. Is possible check for it without going to byte level? Thanks.
-
Do you want to check a single line, or the whole file?Eran Zimmerman Gonen– Eran Zimmerman Gonen2011-09-14 09:08:29 +00:00Commented Sep 14, 2011 at 9:08
-
Is it guaranteed, that the line feeds are correct?Michael Cornel– Michael Cornel2011-09-14 09:10:37 +00:00Commented Sep 14, 2011 at 9:10
-
check single line. Yes, line feeds are correct.user710818– user7108182011-09-14 09:15:16 +00:00Commented Sep 14, 2011 at 9:15
-
Do you mean character which cannot be printed in a specific font? There are characters which are undefined in any font. This might be the same thing.Peter Lawrey– Peter Lawrey2011-09-14 09:16:49 +00:00Commented Sep 14, 2011 at 9:16
8 Answers
Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.
E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):
String line;
try (
InputStream fis = new FileInputStream("the_file_name");
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
) {
while ((line = br.readLine()) != null) {
// Deal with the line
}
}
3 Comments
isr bit was correct; the () are supposed to be (), not {}, and the last semicolon isn't required (but it's allowed, so I've left it -- more in keeping with the lines above it).While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
You can then do whatever you like with those lines.
EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.
3 Comments
Just found out that with the Java NIO (java.nio.file.*) you can easily write:
List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
System.out.println(line);
}
instead of dealing with FileInputStreams and BufferedReaders...
3 Comments
If you want to check a string has unprintable characters you can use a regular expression
[^\p{Print}]
1 Comment
How about below:
FileReader fileReader = new FileReader(new File("test.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
// if no more lines the readLine() returns null
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
}
Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html
1 Comment
I can find following ways to do.
private static final String fileName = "C:/Input.txt";
public static void main(String[] args) throws IOException {
Stream<String> lines = Files.lines(Paths.get(fileName));
lines.toArray(String[]::new);
List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
readAllLines.forEach(s -> System.out.println(s));
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
Comments
The answer by @T.J.Crowder is Java 6 - in java 7 the valid answer is the one by @McIntosh - though its use of Charset for name for UTF -8 is discouraged:
List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }
Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}