This project is read-only.

time of assembly process Padena

Mar 2, 2012 at 4:48 PM
Edited Mar 2, 2012 at 4:53 PM

hello everybody!

I have 2 questions:

1. I'm using padena in HPC cluster, i run padena stand alone VS. padena cluster, i was surprised, because padena cluster takes 1:01:08 while stand alone takes 0:38:17, the file has a size of 98.788 kB. is the mpi affect the assembly time?

2. Now i'm usign a stand alone padena assembler, because is faster than padena cluster assembler, but when i use padena for assembly a file of 98.788KB take me 0:38:17, but when i try to assemble a file of 231.211KB this task doesn't end yet after 5 days . In a machine with a processor intel core i7 with 8GB of RAM.

here the comand prompt with the process since february 27 2012  16:15 to march 2 2012 11:45

 


Initializing - End time: 27/02/2012 04:15:24 p.m.

Step1&2: Create Kmer and Graph - Start time: 27/02/2012 04:15:24 p.m.


2437152 sequence(s) processed.

3057339 sequence(s) processed.

3545294 sequence(s) processed.

Processed total 3659310 sequencecs.   

Graph built successfully. 

  GenerateLinks Started.... 

  Generate Links Ended.

Step1&2: Create Kmer and Graph - End time: 27/02/2012 04:57:05 p.m.

Estimating default values - Start time: 27/02/2012 04:57:06 p.m.


Estimating default values - End time: 27/02/2012 04:57:06 p.m.

Step3: UndangleGraph - Start time: 27/02/2012 04:57:06p.m

...........................................................................................................................................................................................

............................................................................................................................................................................................

.............................................................................................................................................................................................

.............................................................................................................................................................................................

..............................................................................................................................................................................................

....................................................................................................................................

 

if padena using the same algorithm based in brujin graphs of ABySS, given in polinomial time, why the increse of the file, increase the time of the assembly process like a exponential time algorithm?

is there any evidence of assembler processes with large sequences?

 

greetings,

 

@MontesLeonardo

Mar 5, 2012 at 5:37 PM

Hi Leonardo,

Please look at my answers to your previous questions about PadenaUtil on HPC in the forum threads 'Padena execution in HPC cluster' and 
'Use of PADeNA in a distributed memory enviroment'. In these and in your email conversations with Rick, we have tried to make it clear that PadenaUtil IS NOT WRITTEN FOR operation on an HPC cluster.

This does not mean it will not run on a cluster - as you have shown, it will. It does mean it will not take advantage of any cluster feature - it probably only runs on one cluster node exactly as it would if that was a standalone machine.

PadenaUtil is designed to run on a single machine - although that machine may have multiple cores and processors. If you have questions about running PadenaUtil in this environment please let us know.

Simon

 

 

Mar 5, 2012 at 7:20 PM
Edited Mar 5, 2012 at 7:24 PM

hi simon,

 

just like i said "I'M USING STAND ALONE PADENA ASSEMBLER" i'm not using cluster HPC, because like i said "IS SLOWER IN CLUSTER" i know that, i'm using padena on a single machine with multiple cores(intel core i7). i'm remember perfectly your answers.

my question: I'm using the same single machine for assemble two diferents sequences:

sequencece 1    

Size: 98.788KB

Assembly time: 0:38:17

sequencece 2  

Size: 231.211KB 

 Assembly time: more than five days

 

my question again:

 

if padena using the same algorithm based in brujin graphs of ABySS, given in polinomial time, why the increse of the file, increase the time of the assembly process like a exponential time algorithm USING A SINGLE MACHINE WITHOUT PRESENCE OF HPC CLUSTER?

is there any evidence of assembler processes with large sequences?

 

greetingsn and hopping a prompt answer,

 

@MontesLeonardo

Mar 6, 2012 at 5:01 PM

Leonardo, I just received the data that Jan noted in issue 7780. Perhaps this will uncover an issue that is affecting Padena both in his case and yours. We did not see issues of this type during testing. The input data set has the biggest impact on assembly time rather than the size of the output.

In the mean time you could try a couple of things:

try assembling a subset of the data

create a new issue and attach the data so we or others in the community could replicate and run under a debugger to isolate the problem

I will update 7780 with our findings. Many thanks and sorry for the problems you have encountered.

Rick for the .NET Bio team

Mar 26, 2012 at 5:15 PM

Hi Leonardo,

Two other questions --

1) How much memory do you have?  It's a very memory hungry algorithm.

2) Are you running anti-virus? You might disable any real-time file protection while you do your run.

mark

Mar 27, 2012 at 4:12 AM

hi mark,

 

1) i'm using 8GB RAM, i will try with a simple HPC cluster with 18gb in the nexts days.

2) and yes, i was using microsoft forefront security.

 

let me know all you questions about this project i'm very interested in know it.

 

greetings,

 

Leonardo