H
Hiline1961
Guest
Hi. I'm probably overthinking this so I'm confusing myself.
I have 10 Original URLs that I want to parse the content asynchronously with HttpClient. Once I parse the content, I want to find the "Next URL" tag that I will want to parse. This will continue until there is no "Next URL" tag.
I don't want to do HttpClient recursively so I was thinking about a Semaphore/Concurrent Collection.
My initial thought is this.
1) Create a Concurrent Bag with the initial 10 URLs.
2) Use a Semaphore to limit access to the Concurrent Bag.
3) Parse each URLs content with HttpClient and find the Next URL.
4) Add the Next URL to the Concurrent Bag before I release the Semaphore so that it will be processed subsequently
5) At the end of all this, I want a Collection of URLs that includes the 10 Original URLs and all Next URLs.
This is where I confuse myself.
a) If an item is removed from the Concurrent Bag (CB1) once it is processed, does this mean I need a second Concurrent Bag (CB2) to insert the items that have been processed in CB1? That way when I process all the items in CB1 they will be in CB2?
b) How do I know I'm done with CB1? Let's say I have 3 threads. Thread 1 and 2 reach the end of the Next URLs and release so there is nothing new to process. However, Thread 3 is still working and finds a Next URL. If I add it to the Concurrent Bag - how do I ensure it gets processed?
Thanks much!!!
Continue reading...
I have 10 Original URLs that I want to parse the content asynchronously with HttpClient. Once I parse the content, I want to find the "Next URL" tag that I will want to parse. This will continue until there is no "Next URL" tag.
I don't want to do HttpClient recursively so I was thinking about a Semaphore/Concurrent Collection.
My initial thought is this.
1) Create a Concurrent Bag with the initial 10 URLs.
2) Use a Semaphore to limit access to the Concurrent Bag.
3) Parse each URLs content with HttpClient and find the Next URL.
4) Add the Next URL to the Concurrent Bag before I release the Semaphore so that it will be processed subsequently
5) At the end of all this, I want a Collection of URLs that includes the 10 Original URLs and all Next URLs.
This is where I confuse myself.
a) If an item is removed from the Concurrent Bag (CB1) once it is processed, does this mean I need a second Concurrent Bag (CB2) to insert the items that have been processed in CB1? That way when I process all the items in CB1 they will be in CB2?
b) How do I know I'm done with CB1? Let's say I have 3 threads. Thread 1 and 2 reach the end of the Next URLs and release so there is nothing new to process. However, Thread 3 is still working and finds a Next URL. If I add it to the Concurrent Bag - how do I ensure it gets processed?
Thanks much!!!
Continue reading...