Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,207,747 members, 8,000,166 topics. Date: Tuesday, 12 November 2024 at 12:43 AM

600,000 Lines Of String From A Text File(java Scanner ) - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / 600,000 Lines Of String From A Text File(java Scanner ) (2759 Views)

Function Points (FP) Vs Lines Of Code (LOC) / [MOD] PHP WORKOUTS - String Manipulation 1 / How Do I Write Sql Statemen That Can Cast String To Double (2) (3) (4)

(1) (Reply) (Go Down)

600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 3:52pm On Feb 08, 2013
I am just working on a code to solve 2SAT problem. Have written the code and on my way to test with a large input data from a text file.
However, just to read the input data alone, it has taken my system more than 2 hrs and counting to read into the program. I'm just wondering if because the Scanner class or my system. As I'm somehow puzzled by this development. Any suggestions good people?
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Shimao(m): 4:17pm On Feb 08, 2013
You need to be sure your code isn't in an infinite loop.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 5:11pm On Feb 08, 2013
Not at all as I have tested the one of 10,000 lines before this particular one that is taking forever to read the input data into the program.
Thanks a lot.More ideas please.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by lordZOUGA(m): 5:39pm On Feb 08, 2013
can you post your source code?
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 5:54pm On Feb 08, 2013
The code below is meant to load the input data to the program but it is taking almost for ever to do this.

public static void main(String[] args) throws Exception {


Scanner iFile = new Scanner(new FileReader("graph44.txt"wink);

Graph graph = new Graph();





while(iFile.hasNext()) {

Node<String> a = new Node<String>(iFile.next());

Node<String> b = new Node<String>(iFile.next());

System.out.println(a + " " + b);

int aPosition = graph.indexOf(a);

int bPosition = graph.indexOf(b);



// If a does not exist in the graph yet.

if(aPosition == -1)

aPosition = graph.addNode(a);



// If b does not exist in the graph yet.

if(bPosition == -1)

bPosition = graph.addNode(b);



Edge edge = new Edge(graph.getNodeAt(aPosition), graph.getNodeAt(bPosition));



graph.addEdge(edge);

}
}
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Javanian: 6:59pm On Feb 08, 2013
did you code the Graph or Edge or is it from the Standard Library??
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 7:03pm On Feb 08, 2013
@Javanian, yes I did code the Edge and Graph and not from the Standard Library.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Javanian: 7:09pm On Feb 08, 2013
Don't use the Scanner class, Scanner is Very slow especially for large inputs like this, rather use the classes in the java.io package...

I have said it before on this forum only use the Scanner class when receiving small inputs from the keyboard....
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 7:17pm On Feb 08, 2013
^^ Yes, you are right about its speed compared to BufferedReader. I opted for it because of its flexibility for other tasks like nextInt(), nextDouble() etc. I didn't know it was going to be this slow. Thanks a lot for your input.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Fayimora(m): 10:58pm On Feb 08, 2013

Fixed your problem yet? If not then how about we figure out where the problem is coming from?

Little math first
A char is 2 bytes
A String needs ((2 * String.length) + 16) bytes
let longest string length = 30
Memory needed ~ 600000 * (32 + 16 + (30 * 2)) / 1024 ~ 65M // each string has: class, offset, length, array (total 32), each array has type, length (total 16) and data

Now, I'm very sure your system has >70M of memory to spare(it does right?)

So let's start by making sure reading in input isn't a problem. I have never had to not use Scanner so you can start with it. I'll get you started

Scanner cin = new Scanner(new FileReader("blahblah.txt"wink)
long counter = 0
while(cin.hasNext) counter++
assert(counter == 600000)


Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 9:44am On Feb 09, 2013
^^ @ Fayimora, Thanks. I have not yet overcome my challenge yet. Could you believe it, since the time I first posted this challenge till the time I woke up in the night( 11.30pm), it was still running. I just could not take it any longer then. I just halt it. I have decided to use BufferReader instead. The major issue is that I'm facing time constraint. I still have to test it with 800,000 and 1,000,000 lines of string from a text file respectively. Concerning memory, It is not a problem at all as I have more than enough.

In another development, now I understand why the big players in financial service industry are investing billions of dollars in FPGA in order to increase
the speed of their systems. In industry where 1us makes a lot of difference in millions of businesses all over the world.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by lordZOUGA(m): 9:56am On Feb 09, 2013
must you process each line in the text file? .. if that is so then am afraid no algorithm can save you from doing it n line times..
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 10:03am On Feb 09, 2013
Unfortunately, yes in this case.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by lordZOUGA(m): 10:16am On Feb 09, 2013
Otuabaroku: Unfortunately, yes in this case.
but you can do this concurrently... say split the work between five different threads...
I do not know how this is done in java but I suppose this can be done..
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 10:49am On Feb 09, 2013
lordZOUGA:
but you can do this concurrently... say split the work between five different threads...
I do not know how this is done in java but I suppose this can be done..
Good Good. That's a good idea. Would do the three tests concurrently straight away. Thanks a lot lordZOUGA.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by PrinceNN(m): 2:39pm On Feb 09, 2013
uhm....if memory isnt a problem, y not use the ReadAllBytes() (Java 1.7) function to read d whole file into a byte array

or u can read in large byte chunks and feed them to your program ByteBuffer




import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

class Main {
public static int x = 4; //increase the value of x as u desire :-) it determines the chunk size
public static final int SIZE = 1024 * x;

public static void main(String[] args) throws Exception {

FileChannel channel = new FileInputStream( "graph44.txt" ).getChannel();
ByteBuffer buffer = ByteBuffer.allocate(SIZE);
channel.read(buffer);
buffer.flip();
while (buffer.hasRemaining()){
//Do Whatever you want
}
}
}

and yea....use threads
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Kobojunkie: 6:12pm On Feb 10, 2013
public static void main(String[] args) throws Exception {


Scanner iFile = new Scanner(new FileReader("graph44.txt"wink;

Graph graph = new Graph();

while(iFile.hasNext())
{

Node<String> a = new Node<String>(iFile.next());
Node<String> b = new Node<String>(iFile.next());
System.out.println(a + " " + b);
int aPosition = graph.indexOf(a);
int bPosition = graph.indexOf(b);

[color=#990000][b]//What you have above is not good logic. You checked if there was a next and obtained Node<String> a, but you did not make sure there an iFile.next() before obtaining Node<String> b. This is a bug that could cause your program to fail [/color][/b]

// If a does not exist in the graph yet.
//Try using a less than one instead of an equal to minus 1. The reason I say this is because no where in your code here do you actually set the value of aPosition and bPosition to -1, and in Java, the default value for integers is not really -1.

if(aPosition < 1)
aPosition = graph.addNode(a);
// If b does not exist in the graph yet.
if(bPosition == -1)
bPosition = graph.addNode(b);

Edge edge = new Edge(graph.getNodeAt(aPosition), graph.getNodeAt(bPosition));
graph.addEdge(edge);

}
}


Conclusion
============

It is likely that the program is failing somewhere in your code, but you aren't catching or handling the issue. Please try to unit test your class, and step throu7ggh this bit to find where the potential bugs may exist. Scanner class is not the problem.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Kobojunkie: 6:13pm On Feb 10, 2013
Kobojunkie: public static void main(String[] args) throws Exception {


Scanner iFile = new Scanner(new FileReader("graph44.txt"wink;

Graph graph = new Graph();

while(iFile.hasNext())
{

Node<String> a = new Node<String>(iFile.next());
Node<String> b = new Node<String>(iFile.next());
System.out.println(a + " " + b);
int aPosition = graph.indexOf(a);
int bPosition = graph.indexOf(b);

//What you have above is not good logic. You checked if there was a next and obtained Node<String> a, but you did not make sure there an iFile.next() before obtaining Node<String> b. This is a bug that could cause your program to fail

// If a does not exist in the graph yet.
//Try using a less than one instead of an equal to minus 1. The reason I say this is because no where in your code here do you actually set the value of aPosition and bPosition to -1, and in Java, the default value for integers is not really -1.

if(aPosition < 1)
aPosition = graph.addNode(a);
// If b does not exist in the graph yet.
if(bPosition == -1)
bPosition = graph.addNode(b);

Edge edge = new Edge(graph.getNodeAt(aPosition), graph.getNodeAt(bPosition));
graph.addEdge(edge);

}
}


Conclusion
============

It is likely that the program is failing somewhere in your code, but you aren't catching or handling the issue. Please try to unit test your class, and step throu7ggh this bit to find where the potential bugs may exist. Scanner class is not the problem.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by Otuabaroku: 12:47pm On Feb 11, 2013
[quote author=Kobojunkie][/quote]
^^ @ Kobojunkie, you are right that I should have checked what iFile.next() returned before creating my Node object, in other words I should have done something like this:
String first = iFile.next();
String second = iFile.next();
if(first !=null && second ! = null){
Node<String> a = new Node<String>(first);
Node<String> b = new Node<String>(second);
}
In this case, the test data are pair of strings in each line e.g 1234 1234 in a line, so it did not serve as bug to this program.

However, you are wrong about the use of if(bPosition == -1) not advisable in this case, as my graph method that searches for existence of a node( graph.indexOf(a) ) is programmed to return -1 if the node does not exist.Hence my logic checks for the position before doing anything else.

Concerning the source of challenge; here are they:
1 Scanner class( the test data was very large and needed a faster class to perform the operation)
2.The System.out.println(a+" " + b);(this line of code added to the total time it took to print the final result of whole operation).


How I solved my challenge:
I used BufferedReader and removed the system.out.println(a+ " " + b) statement, since I deliberately included it to check if the data is actually read from the text file. I had three files to test, hence I created multiple thread that accomplished the task concurrently. this obviously required more memory; I added extra memory by doing for example java -XM1024M myclassname. The time it took to test the three test files reduced significantly and I was able to beat the deadline.

Once again thanks a lot Kobojunkie and Prince.
Re: 600,000 Lines Of String From A Text File(java Scanner ) by WhiZTiM(m): 1:07pm On Feb 13, 2013
I did not bother reading your code cause I am not a Java guy.

well, yeah huge problems do exist for a small program.

First of all, check your algorithm. Is there a way you can order your inputs on the fly(while reading each line). Use a binary tree. Averagely, for simple data like strings, its operationally the fastest data structure that easily stands alone.
...
Another optimization you can apply is to read the whole file(if its a few MBs) or sizeable chunks(if the file is damn big) as binary blocks into the RAM, and start processing it from there.

... I have parsed a 40MB dictionary list of 500,000 words stored in 39, 000 lines... I achieved that in a minute... The reading process took only 8secs.
(2GB DDR3, 2.0Ghz intel Core2 Duo, 32bit Ubuntu 12. 10)
Efficiency matters... Though I dunno if JVM does loop unfolding. I guess my compiler did that for me ...to assist branch prediction things...
In GNU g++, its optimization system is pretty good, to such an extent that you can get the speed of C or even better.

That doesnt matter,

Secondly... Watch your algorithm for CPU cache misses... Thats bloody a bloody drawback on large datasets.

Maybe your algorithm is of quadratic time... Slice it down with sorted structures,...
"Waste memory if you have to save time" ...says an ICPC -ACM veteran.

1 Like

Re: 600,000 Lines Of String From A Text File(java Scanner ) by WhiZTiM(m): 1:17pm On Feb 13, 2013
Otuabaroku:
How I solved my challenge:
I used BufferedReader and removed the system.out.println(a+ " " + b) statement, since I deliberately included it to check if the data is actually read from the text file. I had three files to test, hence I created multiple thread that accomplished the task concurrently. this obviously required more memory; I added extra memory by doing for example java -XM1024M myclassname. The time it took to test the three test files reduced significantly and I was able to beat the deadline.

Once again thanks a lot Kobojunkie and Prince.


oh... I didn't read this,... Great u've solved it. I guess the bufferedReader class helped u in reading buffered chunks...
For future problems like this, I recommend the book... "The Algorithm Design Manual" by Prof. Skiena ...a US based Algorist

(1) (Reply)

Become A Pro In Programming For Only N6000 / What Is Your Hourly Rate In Nigeria / Genrating Crystal Report With Mysql Database In A Vb.net Application

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 48
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.