[Vanilla List] Growing Netrek - Measures

Fri Dec 7 00:27:00 CST 2001

On Fri, Dec 07, 2001 at 09:27:18AM +1100, James Cameron wrote:
> On Wed, Dec 05, 2001 at 11:38:38PM -0500, Mark Mielke wrote:
> > select() is only implemented for sockets. 
> Wrong.  I'm using Cygwin.  Anyway, it's not relevant, since it is only
> sockets that the server uses.  File I/O can be ignored, in my opinion.

Worse, select() is emulated in CYGWIN. It will be your true bottle neck.

Then again, I suppose what other purpose do some P4 1.6Ghz machines
have other than spending 90% of their CPU emulating UNIX with CYGWIN
to run a Netrek server... :-)

> > This is not too mention that an event driven system is a system that
> > switches in user space.
> What?  The switch has to be done somewhere.  It is much better if it
> is under the direct control of the programmer.  More efficient.  O/S
> level context switches are by comparison much less efficient, as they
> unnecessarily save registers.

Often this is true. The reason why it is true is that most people are
not familiar with programming in a threaded environment, and often end
up serializing critical sections that should not be as large as they
are.

In terms of potential, a threaded program has significant gains over a
process switched program. What are these gains? 5 processes can be
performing rights at the same time, while the 6th is performing
calculations. In a multi-CPU machine, a 7th or 8th process can also be
performing calculations.

Some systems support asynchronous I/O. I have not seen asynchronous
I/O be used by any major (UNIX) products in existence.

With select(), one can only perform a single read()/write() at a time,
and CPU operations such as daemon updates need to be scheduled between
these system calls. Sure, larger system buffers can improve
performance, but for a real time system with a large amount of data
being actively transferred back and forth between 16 or more
clients... performance will suffer significantly.

> > This complicates the code significantly, and ensures that operations
> > cannot happen asynchronously.
> I disagree.  Even with threads, on a single CPU, only one thing is
> happening at once.  The Netrek server can support 20 players on a 486,
> at 100MHz, so it doesn't need a dual processor.

This is not actually true. 16 ntserv processes can be performing
write() operations while the daemon is updating, or the robot is
determining its next move. Move to a single threaded single process
event driven model, and this is no longer true.

> > As an example, with the current process model, 16 ntserv processes 
> > can be actively waiting for I/O while the daemon process is executing
> > code.
> A wait is not active.  If I/O completes, the processes become schedulable,
> joining the run queue, and while the operating system is entitled to
> interrupt the daemon in order to process the I/O, most O/S's do not.
> The daemon will relinquish the processor soon.

A wait *can* be active. Consider read()/write(). Waiting for a write() to
complete doesn't mean that nothing is happening. In an event driven model,
waiting for select() *is* waiting.

> Now while my experience with operating system design and performance
> optimisation techniques is limited to OpenVMS and Linux, I would venture
> to say that the Win32 environment can't be terribly bad at this.

In some ways the win32 environment is significantly better, in others it
is quite worse. In any case, CYGWIN is an emulation layer, and you can
therefore expect worse performance.

> > With an event driven system, daemon updates would need to be scheduled,
> > and when the daemon was performing an update, I/O could become blocked.
> I'm afraid you don't know what you are talking about.  The daemon does the
> update in less than a tenth of a second.  We don't need to process the I/O
> from the clients during that time, since the results would likely be
> buffered by the ntserv for inclusion into the next update.  The data can
> wait.

I don't mean blocked as in a write() blocking. I mean blocked, as in
no data can be transferred between any of the clients and the
server. For "less than a tenth of a second", "10 times a second", the
server will be unable to exchange data. Just think about it for a bit.

> > only to throw it away as soon as it realizes: "Hey, 8
> > people have data for me." For more real-time systems such as Netrek,
> > there is almost always data to be read. Why bother switching yourself,
> > when you can let the system do it for you?
> Because system switching costs more CPU cycles.  I can guarantee that.
> Netrek may look real-time, but by comparison to other real-time systems,
> Netrek is very simple and slow.  Only ten updates per second for 20
> clients?  Big deal.

1) The system is currently written to access shared memory. This is
   practically already a threaded model. An event driven model, by
   comparison, is far different.

2) Computers these days have Gbytes of hard drive, and Ghz of timer ticks.
   Therefore, let's make Netscape 50 Mbytes big in memory just to display
   a simple HTML page. Throw in XEmacs, and Microsoft Office. After all,
   we can always tell our customer to buy a bigger machine...

   This line of thinking has a critical point that shouldn't be passed.

> > Threading is very neat and tidy. As I'm sure you are aware, Netrek
> > already deals with shared memory, which is not really that different
> > from the way threads function. The details you would need to fiddle
> > with for threads, is that you would want to avoid using signals to
> > wake up your threads. Instead, using a conditional, or a wait.
> Yes, I know how to thread it, but having been there and done both
> threading and event driven models over my (ahem) 20 years professional
> programming career, I'm certain that an event driven model would be
> - easier to maintain,
> - just as effective,
> - cheaper to build,
> - simpler to debug, (lack of concurrency and race issues)
> - more efficient than a threaded model.

I don't know why it would be easier to maintain, when Netrek already
uses shared memory. As for 'just as effective', this remains to be seen.
Cheaper to build? Why would changing the model be cheaper to build?
More efficient? Only if the threaded model was not done properly.

Simpler to debug? Yes.

> - Netrek does not need to restrict the developer base by choosing a 
>   difficult technology.  We need to do the reverse.

Most people should have pthreads by now.

> - Netrek is not a place in which we should demonstrate our coding 
>   prowess for an examiner.  We're not a training ground.

So leave it as is than. Just as many people misunderstand event loops
as misunderstand threading.

> > I suspect it would take less time to make the server threaded, than it
> > would take to make it event driven.
> You start on the threading, I'll start on the event driven model.
> Show us the code.  Who will judge between us?

Hmmm... :-)

> I'm worried that you're only discussing this in order to get me to act.
> It's a conspiracy.  ;-)

It's entirely possible...

Tell you what... I'm leaving for Australia for my Christmas
vacation. At least some of those days (Dec. 19th -> Jan 4th) I'll
probably be pretty bored. You do your event driven model, and I'll do
the thread model. We'll see which patch is smaller... :-)

mark

-- 
mark at mielke.cc/markm at ncf.ca/markm at nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/