Our development engineers are called “Scientists” at Adobe which is cool because it means that we understand the importance of experimentation. Like all good scientists, we hypothesize and test – some things make it into the product, others don’t – but we’re always innovating. Here’s an example that Rupesh Kumar (CF Computer Scientist / Software Engineer) did on his own time after talking with a number of our customers about asynchronous processing in CF….not a lot of bells and whistles, mind you, but still a great Proof Of Concept of the CFTHREAD and CFJOIN tags. I’ve also included a couple simple example templates to demonstrate the tags.
The first example copies a file 50 times files + sleep 200ms each time, and the second example launches 50 threads with CFTHREAD to do the same work and join back up at the end of the page with CFJOIN.
This POC works with CF 7.0.2 (Standard or
A note on the simple example: you’ll want to modify both templates to copy a file (I use a “C:\baseline.txt” file) that exists on YOUR system to a directory that exists on YOUR system (I used “C:\____WORKFOLDER”). I’d recommend turning on debugging so you can see the execution times yourself as well (much more dramatic that way).
Also note that spawned threads shouldn’t count towards Simultaneous Threads slots of the CF Server, so you shouldn’t need to adjust that setting to accommodate your extra threading.
Running the serialized example with debugging enabled yields this execution time block:
Execution Time
| Total Time | Avg Time | Count | Template |
| 10072 ms | 10072 ms | 1 | C:\Inetpub\wwwroot\threadsynch_serial.cfm |
| 0 ms | | STARTUP, PARSING, COMPILING, LOADING, & SHUTDOWN | |
| 10072 ms | | TOTAL EXECUTION TIME | |
red = over 250 ms average execution time
And running the CFTHREAD example with debugging enabled yields this execution time block:
| Avg Time | Count | Template | |
| 235 ms | 235 ms | 1 |
|
| 0 ms | | STARTUP, PARSING, COMPILING, LOADING, & SHUTDOWN | |
| 235 ms | | TOTAL EXECUTION TIME | |
red = over 250 ms average execution time
To be crystal clear, this little POC is not a CF product feature at this time. This is just a simple and unsupported engineering Proof Of Concept. I’d personally welcome your feedback and comments and I’d love to hear how folks might be able to use such functionality, but the POC’s are unsupported. This is also no guarantee that the tag syntax will not change (perhaps drastically) in the future, and we have no plans to update the POC as future major versions of CF come out, etc, etc. CFTHREAD and CFJOIN tags may or may not make it into a future release, but that partly depends on feedback from customers.
One known issue/observation in playing with this personally: there appears to be a bug where if you don’t rejoin (using CFJOIN) all the threads you've spawned by the end of the page or you get a “500 Null” error (at least with my example, if you comment out the CFJOIN loop). However, calling CFTHREAD (once) and skipping the CFJOIN appears to work fine. Just so you’re aware. You still get amazing parallelism as demonstrated above, even if your page waits for all threads to complete. Maybe if there’s interest we could fix that one thing, but for now, do play with these POC tags and let me know if they’re useful to you.
DOWNLOAD CFTHREAD POC & SAMPLES (74k)
Damon
It's great to see this released. This has been something I've really been looking forward to in CFMX. While the Event Gateways are awesome, being able to asynchronously process information in a new thread is really helpful in some tasks--such as using <cfindex /> to update a collection (which is a time consuming process that the user should not have to wait on.)
However, since this is a "non-support" project, what are the chances of getting the source code released? That way the community can fix any potential problems with the code (such as the 500 null errors?)
Damon, you have no idea of how great your timing is on this ...I JUST started rolling my own implementation to support a new project in-house. We've used a homegrown CF reporting platform for years here but have always missed the more advanced management features of the pricey reporting/BI packages out there... In a nutshell we're creating a management/monitoring console in Flex 2 with data services ...with a report manager (CF) which controls/monitors threads on the system. Very, very slick! :)
I also love the fact you guys are willing to dump this out there even when it's not official. It's the best way to get our feedback!
It seems to be pretty random how many threads actually complete, but in very unscientific testing the most I've seen is 28 out of the 50 threads be completed.
If we had the source code, this is a problem we could probably resolve the problem ourselves. :)
Here's the modified code I'm using. It just requires a "tmp" folder in the directory which the threadsynch_cfthread.cfm resides.
<cfset sDirectory = expandPath(".") & "\tmp" />
<cfsavecontent variable="sContent">XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX </cfsavecontent>
<cflock name="WORKERLOCK" timeout="999" type="EXCLUSIVE"> <cfset Server.ThreadsCounter = 0> </cflock>
<cfloop index="loopcounter" from="1" to="50"> <cfset threadname = "thread_" & #loopcounter#>
<cfthread name="#threadname#"> <cflock name="WORKERLOCK" timeout="999" type="EXCLUSIVE"> <cfset x = Server.ThreadsCounter + 1 /> <cfset Server.ThreadsCounter = Server.ThreadsCounter + 1> </cflock> <cfset sOutput = "" /> <cfloop index="i" from="1" to="100"> <cfset sOutput = sOutput & sContent /> </cfloop> <cffile action="write" file="#sDirectory#\tmp_#x#.txt" output="#sOutput#" /> <cfset thisThread = CreateObject("java", "java.lang.Thread")> <cfset thisthread.sleep(200)> </cfthread>
</cfloop>
<cfloop index="loopcounter" from="1" to="50"> <cfset threadname = "thread_" & #loopcounter#> <cfjoin thread="#threadname#"> </cfloop> <!--- --->
<cfoutput><br><br>#Server.ThreadsCounter# files now written.<br></cfoutput>
<cflock name="WORKERLOCK" timeout="999" type="EXCLUSIVE"> <cfset Server.ThreadsCounter = 0> </cflock>
<br> Work Complete!<br>
Naturally, with the EventGateway side of things, you could achieve the same results. But why force a CFML developer to drop to Java, when you have this beautiful tag based language to express your solutions in. This has always puzzled us.
The BlueDragon implementation has a whole suite of tags and functions to support this feature set, which also runs on both .NET and JAVA platforms. There is a lot of management to do when you start giving developers this ease of access to a very powerful set of processing directives.
If you think a simple CFLOOP can take down a server, then thats nothing when you give them access to spawn threads at their hearts content! :)
http://www.houseoffusion.com/cf_lists/messages.cfm/forumid:4/threadid:26690#134416
In all seriousness I don't particularly care where the innovation stems from. If BD generated the initial buzz, great, kudos to competition! How about we avoid any debates and just try out the tags in question? :)
WRT posting source: maybe. SWe'll see. Right now we're focused on Scorpio with some important milestones coming up, so we're very busy, but I'll see f we can't tweak the main issues with this (need to use CFJOIN + no "arguments" argument to cleanly pass in args).
Dan, not sure about your missing threads. Do you get the number of files copied you expect? Watching threadcount on the CF/JRun process is not an accurate way to see if all threads complete...the VM does it's own thing at the OS level.
Thanks for the comments guys...love to see any examples of how you might use it.
Damon
Damon
No, I'm not seeing the the number of files created that I expect (at least not all the time) and that's even *with* the <cfjoin /> tags in there. Sometimes I'm evening seeing the 500 null message even w/the <cfjoin /> tag in there. This seems to happen when I do a very simple load test.
Here's high level description of what I'd like to do:
Create a facade that can accept incoming report requests, spawn a new thread, execute the report request. (this use case we're talking about offline report generation so delivery mechanism is not part of the equation)
From a management perspective, I need to be able to manage these threads. Perhaps methods to retieve meta data on each or all threads (running time etc) and perhaps the ability to "kill" a thread if so required.
Anyway, just filling in the blanks so you guys have an idea on a potential use.
So here's my vote to please continue development of this feature and make it a regular supported feature of the language!
Thank you.
For example, I just ran the test script again and I get no error messages--but I only get 6 files created.
I've even tried using a static named lock on the <cffile /> calls to see if it's a threading issue w/the file operations. That didn't seem to change the behavior at all.
I've also check the CFusionMX7\runtime\logs and CFusionMX7\logs for any related errors that might be generated, but I'm not finding anything--other than when I actually get the 500 null error.
Now about those enhancements to CFCs... :)
Ok, I've gotten a little further. It seems the variables are not thread safe.
The problem I seem to be running into is any variables within the <cfthread /> tag will be affected by changes by the other threads.
What it seems we really need is a "thread" scope which is unique to each thread. I've tried using:
<cfthread name="#threadname#"> <cfset threadScope = structGet(threadname) />
The problem is, once inside the running thread, the variable "threadname" is no longer thread safe. So that doesn't work.
What I'd love to see if is that you use a scope called "thread" inside the <cfthread /> tags to make thread safe variables.
Outside of the <cfthread /> tag, you could use the thread name.
For example:
<cfthread name="dansThread"> <cfset thread.name = "Dan" /> </cfthread>
<cfthread name="rupeshThread"> <cfset thread.name = "Rupesh" /> </cfthread>
<cfjoin thread="dansThread" /> <cfjoin thread="rupeshThread" />
<cfoutput> #dansThread.name# <br /> #rupeshThread.name# </cfoutput>
I think that would take care of the threading issues.
Actually, after rethinking a bit, I'm wonder if maybe all variables should be local to the thread, and you use like the Caller scope to access variables from the core template.
Also, I'm wondering if there shouldn't be a thisThread scope that contains information about the thread (ie. thisThread.name)
These changes would allow something like the following:
<cfloop index="i" from="1" to="10">
<cfset sThreadName = "thread_" & i />
<cfset threads[sThreadName] = i />
<cfthread name="#sThreadName#">
<cfset thread.thisLoop = caller.threads[thisThread.name] />
<cfset thread.message = "I was spawned from loop " + thread.thisLoop />
</cfthread>
</cfloop>
<cfloop index="i" from="1" to="10">
<cfjoin thread="thread_#i#" />
</cfloop>
<cfloop index="i" from="1" to="10">
<cfset x = structGet("thread_#i#") />
<cfoutput>#x.message#</cfoutput> <br />
</cfloop>
Does this makes sense?
Adobe and New Atlanta both develop solid products for CF and I just want to keep seeing the language get smarter, faster and even more capable. I have to admit, I have been worrying a bit about programmatic / development improvements to CF losing concern amongst all the other things going on.
Thanks again and let me know where to put my vote for getting these tags in the next release!
Mike.
1. Meta Data, Really like that idea alot, get timeelapsed and stuff, etc.
2. Custom Error catch for this. so we could perhaps
<try> <cfthread> ... </cfthread> <catch error="thread"> thread did not process </catch> </try>
that would be neat as well.
http://curl.haxx.se/
It's easy to make your threads "thread safe" by taking advantage of something where you can control scopes (like cffunction). I've posted a code example on my blog to show how you can take advantage of this.
http://www.schierberl.com/cfblog
On a side note, if I had a vote, I would say that threads should have their own scope and variables must be passed into a thread as an argument, just like a function or custom tag.
I posted over at your blog. Well it address newly declared variables, it does address passing in variables in which you need a snap shot of the value at the time where the thread was created.
For example:
<cfloop index="i" from="1" to="10">
<cfthread name="thread#i#">
<cfset y = i />
</cfthread>
</cfloop>
In the above example, there needs to be a way to make sure that in "thread1" the value of Y equals 1, the value in "thread2" equals 2 and so on.
At this point, I've seen no way to do that.
Maybe it's going to be necessary to have an "input" attribute for each thread which will allow you to put in some static snapshot of a variable which you can use in your thread.
I wonder how NewAtlanta handles this.
You're right, you got me... passing a value as an argument still treats it as being passed by reference. Guess I need to give another note to giving threads their own scope. From what I can remember from cfunited, the New Atlanta approach is to give the thread access to tag attributes, application scope (maybe server too). Here's what Vince used as an example.
<cfthread name="myThread"> <!--- myThread has a new Variables scope, like a custom tag, and can access the Application scope and tag Attributes ---> <cfset variables.greeting="Hello, World"> <!--- ...do some more work here... ---> </cfthread>
I'm guessing it would look like
<cfthread name="mythread" myAttribute1="foo" myAttribute2="foo2"> </cfthread>
It's a pretty poor affair when there is no option to stop receiving alert emails just because I happenned to leave a reply!!!
Make the madness stop someone PLEASE!
The thread-safety issue is most likely just an implementation detail (a.k.a. "a bug") in this particular version of the cfthread lib that needs to be worked out. The base Thread class in Java already allows for simple "thread-specific data" (e.g. the name of a thread), and therefore it should be possible to create a data structure of some sort to track unique user-thread instances in order to avoid "the clobber scenario". You could even subclass Thread if you need additional thread-specifc data.
I'm totally aware of this--which is why I'm posting comments to help them be aware of issues. I'm not "condemming" the code, just trying to make people aware of the issues in using the tag in the state it is now.
It can still be very useful in certain situations (such as spawning off a thread to index a verity collection) where maybe the state of the variables are read-only and you can create static variables for the thread that won't change.
Understood. I was just trying to say that a fix should be possible. As you said before, it'd be nice to have the source. :-)
I moved the closing cflock tag to just below the sleep call, so that the copy operation was locked as well as the thread counter. This resulted in a slightly slower running page, but in a proper sequence of generated files.
Laterz, J
If you move the </cflock> closing tag to just below the file copy operation and just before the thread is put into sleep mode, it runs blazing fast but doesn't use the same file names.
Laterz, J
1) the code runs, the files get created, and no output is generated for display (all 50 files are there but nothing is on the screen like X FILES CREATED) 2) the code runs, all the copies happen, but the screen sees an InvalidState error and a cf stack trace
Just wanted to say something before I forgot. :)
Laterz, J
You mention that skipping <cfjoin> is okay if you call <cfthread> only once.
However that doesn't seem to be the case.
I loaded the following code 10 times. It produced the expected screen output 9 times, but produced only 4 files.
<cfthread name="myThread"> <cffile action="COPY" source="C:\Baseline.txt" destination="C:\____WORKERFOLDER\#timeFormat(Now(),'HHmmss')#-#RandRange(1000,9999)#.txt"> </cfthread> <h3>Operation complete</h3>
This is major spam time, and i can't see anyway to unsubscribe myself from this (cf?)thread!
http://www.dcooper.org/blog/client/index.cfm?mode=entry&entry=A71F310C-4E22-1671-5E287AE8918A048B
Enjoy!
Damon
While not an "official" answer obviously, the CFTHREAD tag is currently on tag based. Because of the nature of the tag, there would be no way to wrap it up as a UDF, so you currently would not be able to access it via CFSCRIPT.
They'd specifically have to support some kind of CFSCRIPT syntax--which would end up probably being pretty klunky.
However, you could always use CFSCRIPT inside your CFTHREAD tags. Just break your CFSCRIPT calls up.
In Scorpio, fyi, CFJOIN is no more and this tag has been dramatically enhanced.
Thanks to everyone for their feedback, this has been very valuable to us, and hopefully it has been for you guys as well!
Damon
