Friday, April 23, 2010

Confessions of a Summer Intern interviewer

I am currently interviewing students for a summer internship on my team at Yahoo! Interviewing these "youngsters" always makes me think back to my days when I was just getting started. I also think about how dumb I was in interviewing back then.

Having interviewed many people over the years patterns have emerged for those interviewees that are likely to move on and those that are not. Here are a few pointers for those interviewing for positions:

Don't be over eager!

I just interviewed an undergrad student who ends most of his sentences with exclamation points! And who confided in me that Yahoo! is his absolute dream job. Engineers are pretty sober people. "Just the facts Mame". We are also pretty sincere people. Nothing expresses insincerity more that being too eager.

Don't BS

Another fellow insisted on spending the entire phone screen spouting every buzzword surrounding web application development that there is. However he never said any thing concrete. Don't talk about technology, talk about what you have done with technology. What problems have you solved? What problems have you run into? Be concrete.

Don't talk too much

This is related to the BS recommendation. One of the key skills of an engineer is the ability to listen. If your talking too much, then you are not listening.

Don't interrupt your interviewer

Even if you know exactly what your interviewer is about to say, don't interrupt him. Find an opening and politely inform him or her of your familiarity with the topic. Or better yet don't say anything at all. This is a pet peeve of mine.

Be prepared

You have certainly done one or two projects that have interested you. Be prepared to describe your project succinctly. What technologies did you use? Why? What mistakes did you make?

Remember why you love your field

There are things about your work that you thought were really cool. Bring that out. This doesn't conflict with the "over eagerness" warning since this is sincere.

Be yourself

This is so hackneyed that I won't go into it. You know what I mean. Be a geek.

Tuesday, August 15, 2006

SILICA: This is a product I had figured was inevitable...

Tell SILICA to scan every machine on every wireless network for file shares and download anything of interest to the SILICA device. Then just put it in your suit pocket and walk through your target's office space.

IMMUNITY : Knowing You're Secure

In my sweaty fantasy life I had imagined such a product.  A self contained pocket sized Linux box with all of the exploit software you could want.

If I were a network admin I would be having difficulty sleeping.  Actually I would be filling out a PO for one of these to see if I could preemptively crack the system.  In fact, I would purchase several and have my staff carry them around just to see if they find anything.

It would be great if the device would light up or buzz when it found a vulnerability.  That way you could carry it around like a cell phone or pager and wait for it to let you know if it found anything.

If it works like they say it is very very cool.....

technorati tags:, , , , , ,

Monday, June 19, 2006

Python Perl interop

This is something I've been looking for for a while. I'm a huge fan of scripting languages, however their interop story is lousy. Each can talk to C pretty well, however they can't talk to each other.

This is surprising given that they all run on virtual machines. Microsoft's .Net framework does an excellent job of language interop (given the caveat that your language must conform to the CLR). I hope that someone takes this up in the future.

Cheese Shop: PythonPerl 0.9Allows embedding of Perl in Python.Author: Bruno Obsomer Download URL: http://ece.fsa.ucl.ac.be/bobsomer/PythonPerl.tgzDescription: This package allows to put true Perl code into a Python program. It can: - execute code a string of code - execute a Perl file - set and get Perl variables in PythonThe code will be executed by a true Perl interpreter (thus requires to install it separately).License: GPLClassifiers: Cheese Shop Owner: bobzomer

Python Cheese Shop : PythonPerl 0.9

technorati tags:, ,

Friday, November 19, 2004

Threads are not an architectural element

What do I mean by this, and why am I taking the energy to write about it?

Threads as I have mentioned in an earlier post are a means of solving a narrow set of problems. They are a tool, just like a database , XML/XSLT, HTML, ... are tools. They are not a way of life.

Let me tell you a sad, and most unfortunate story...

Once upon a time at a high-tech start-up, a new manager was hired to clean up a mess. This mess was caused by a naive developer who was being very clever.

This clever developer had created an "architecture" with some very clever building blocks. These building blocks were Microsoft COM components each which had two interfaces and its own thread of execution. The interfaces were very clever and "elegant". One interface was "DoIt(String)" and the other was "String ReportIt(string)".

The application built on this architecture was a large stack of these building blocks. Each block would serialize a command and send it to the next. Each call was asynchronous because each clever block had its own thread.
Function calls would rattle through these blocks and each block did some "useful" work.

Unfortunately this design suffered from the following maladies and ultimately had to be euthanized:

  1. Poor, poor, poor, poor performance. Waking a thread costs time. Crossing several thread boundaries to get work done was many times slower than letting a single thread do the work.
  2. Large memory foot print. Each of these components where their own dll. The COM overhead and the serialization/de-serialization code was often larger than the application code.
  3. Impractical to debug. Even with the use of trace statements it was nearly impossible to debug. We went to heroic lengths to tag trace statements such that we could follow the logical thread of execution we never came up with a good scheme.
  4. Non-deterministic behavior. Once you get beyond three or four threads in an application you lose determinism. The application becomes very sensitive to the operations of other applications on the computer. Your critical threads get swapped out at inopportune times. Timing dependent defects come and go arbitrarily.
  5. Costly to modify. Since each change alters the dynamics of the system (e.g. timing is changed) entirely new classes of defects emerge with even the most innocent change.
  6. sfd
When deployed this application had 40 threads of execution. We made it work by sheer will, but it nearly killed the product. Over time the application was reworked to have 2 threads. The new version took half the number of lines of code, was faster, and had more features.

I have since found other instances where developers have followed similar approaches with the same results. Threads are a tool that you to implement solutions, they have absolutely nothing to do with software architecture.

So why the facination with using threads as an architectural element? What is the allure?

Most engineers use a "divide and conqur" approach to solving problems. Big problems are recursively broken into smaller problems, and each sub-problem is then solved. The threaded component design provides a box into which you can place to solution to a sub-problem. This box has its own thread so all you have to do is present it with an input and it will notify you when it is done processing it. Its a magic box that problems go into and solutions come out of. How nice!

Here we illustrate a big problem being decomposed into smaller sub-problems.

Problem Decomposition

As an engineer I can decompose my problem into a series of independent sub-problems and assign each one to a magic box. I implement the solution to a sub-problem, stick it in the box, and put it on the shelf. I then string together all of these sub-problem solutions to create my solution. Better yet, I can reuse my magic boxes in other solutions. Since each is self-contained with its own thread of execution, I can drop it into a different solution and it will just work.

This diagram shows a problem decomposition directly mapped into a set of threaded components. Unless your problem domain is boxes such an architecture will get you nowhere.

Problem Decomposition

Yes we engineers have rich fantasy lives. This software architecture will absolutly not work in almost all cases.

In the end a software architecture must contain objects/components that are part of the problem domain that it is trying to address. The thread as a architectural element architecture is devoid any domain information.

I hope that I have explained the flaws of this approach. I hope that I have also explain the strange appeal of this approach. The end advise is: Don't do it!

Monday, November 15, 2004

Threads increase design time

Keeping with my regular "post every couple of months" trend here is the next installment in my discussion of threads.

In addition to pulling from my personal experience I've been doing research on threading (which is half of the reason I write this blog). I have been amazed at the chorus of other voices warning of threading problems. There are many others with credentials better than mine who are are calling out the warning. So why bother again? Well the reason is: you can't have too many people warning you not to run with scissors! Perhaps my small contribution will be the last straw to make someone reconsider their multi-threaded application design. If so, my time has been well spent.

To the topic at hand.

Threads Increase Design Time

This is pretty simple. In the same way that it is infinitely easier to toss a single ball in the air as opposed to juggling three balls, it is easier to write a single threaded application than a multi-threaded application.

When designing a multi-threaded application you have many additional elements to consider:

  1. Concurrency

    • Each time you access memory you must carefully think through if it is possible that two threads will be executing this code segment at the same time.

  2. Performance

    • How do you insure that your threads are sleeping most of the time?

    • How do you prevent your threads from executing at the same time which will impare your performance?

    • Does your locking mechanism require a call to the kernal? If so are you prepared for the performance hit of getting locks?

    • Are you locking too much code for too long? Are you forcing your other threads to wait unnecessarily?


  3. Deadlock
    • If you have more than one lock can your threads deadlock?

    • What is your strategy for resolving/preventing deadlocks?


  4. State
    • Making an asynchronous call breaks the flow of your code. You make the call and expect a response at a later time. When you make the call you must save your state so that when the call eventually returns you can continue from where you left off. This is a major design consideration that is more often implemented poorly. Organizing your modules to have a single place where ansynchrounous returns are handled helps.
    • What happens when an asynchrounous call doesn't return? Are you tracking the time of your calls? What is your policy?

  5. Resource consumption and memory management
    • If one thread allocates memory, who deallocates it? Managing the lifetime of global resources (memory, file pointers, database connections) becomes especially difficult. Which thread "owns" the resource? How are you sure that all threads are done using the resource?

    • How many threads and locks are you using? These are kernel resources that are not free?

  6. Design for testing
    • Current debuggers are (and have been for over a decade) ineffective at tracking problems in multi-threaded applications. How will you debug your code? What tools will you use?

    • You will make use of extensive trace statements in your code (this is the answer to the previous question). What trace library will you use? How will you deploy your trace statements such that they don't significantly change the behavior of your code? What tools will you use to analyze the many megabytes of trace output produced?
It would seem that I am painting a bleak picture of multi-threaded development. Well I am, and it is well founded. This is one of the primary reasons that J2EE, COM+, and other Enterprise frameworks make you write single threaded code. These are highly multithreaded frameworks, but they make writing applications easier by hiding this fact.

Wednesday, August 25, 2004

Threads impair performance

Well I'm back after a vacation and chasing kids around. So my next entry is on how threads impair performance.

The basic reason that threads impair program performance is that each thread you spawn is competing for limited resources such as CPU time, or memory, or I/O bandwidth. When you spawn a thread you are competing with yourself. The more threads, the less resources for each.

That observation doesn't sound very convincing since modern operating systems are designed to share resources in a fair manner. The difference with thread is that they consume CPU timeslices. One of the jobs of an operating system is to split the CPU between all processes that are running on the system. Rather complex algorithms have been developed to insure fairness in a wide variety of situations.

When you spawn a second thread your application has potentially twice the number of CPU needs, however it will not get twice the number of time slices. All you have accomplished is limiting the number of CPU time each of your threads get. Each successive thread just digs you into a deeper hole.

But what about multiple-CPU machines? Sorry no dice. The operating system controls access to the CPUs. There is no way to guarantee that your threads will be efficiently scheduled to take advantage of the processors. Some operating systems allow processor affinity to be specified for threads. However at best this is only a hint.

A secondary effect is the impact multiple threads have on CPU cache coherency. Each thread has its own program pointer and stack pointer. When a thread is scheduled these pointers must be restablished, which takes time. When the new thread runs it establishes a new pattern of memory access in the CPU cache which churns the one from your other thread. Cache misses are expensive.

Up until now I've made the implicit presumption that threads your threads are running free and competeing with each other. Obviously this is the worst case scenario. Let's consider a more benign situation. We have two threads one of which is performing computation and the other which is waiting on a relatively long I/O call. The computing threads job is to perform operations on the results returned by the I/O thread.

All is good right? Your application is maximizing the use of computing resources. This should be very efficient. It is not! To see why consider the following cases:

Case 1: Computing thread wakes when I/O thread has posted new results. In this case there is no overlap between the threads, so they are not in contention for the same resources. However nothing has been gained. First there is the thread context switch overhead that is now incurred. Second on a modern computer calculations take a diminishingly short amount of time compared to I/O. All you have done in this case was gain a couple of microseconds of overlap with the I/O. Overall your application will not run any faster than the I/Os that it needs to perform. If this application were to perform a great deal of computation per I/O, ane execute billions of I/O you might gain an overall benefit. More likely than not these benefits will be swamped by context switching and lock overhead.

Case 2: The computation thread is still active when the I/O thread wakes up. You are so dead. You now have two threads that fighting for CPU time. Your performance will suffer greatly.

There is a great discussion of avoiding thread optimizations here. Don't just take my word for it.

Friday, June 11, 2004

The History of threads

Threads have not always been around. The first versions of Unix only had processes. Versions of Windows prior to Windows 95 didn't even have processes. The Usenet group FAQ for comp.os.research has an excellent history of threads.

Prior to the development of threads the developer was limited to using processes. A process is a heavyweight object that includes its own memory space, code space, heap space, and stack space. This means that if you needed to do a task in the background you had two choices:

1. You could spawn an entirely new process. Since a process has its own memory space you needed to develop a means of communicating between the two processes. Some of the choices were, shared memory, signals, shared files, pipes, or sockets. All of these approaches were very complex and difficult to get right. On top of that these means of "interprocess communication" were all heavy weight.

On older Unix systems you had the problem of Zombie processes which couldn't be killed. These were caused by one process spawning another and then not cleaning up properly.

2. You could emulate a backgroud task through a message loop style design. In this case you had an infinite loop that serviced a queue of tasks. Work ot be done was added to the queue. You were really implementing your own multitasking layer on top of the OS. This is techique used by the evil Motif Widgets package as well as all versions of Microsoft Windows prior to Windows 95.

As you can see implementing a background task was a significant effort. This was not as bad as it sounds. It forced developers to think through their designs. Only the most worthy of uses for background tasks were actually implemented. Again this is good since the use of background tasks is easily an order of magnitude more difficult and complex than single threaded programming.

Work on operating systems that supported threads began in the mid-1960's. Threading packages began to show up in in mainstream Unix releases in the early 1980's.

Threads were hailed as the solution to many problems in systems development. A thread has its own stack and address pointer, but shares the memory space and heap of its parent task. Communication between threads is within the same task. Communication can be as easy as pointing to a common place in memory. (Sure it is ...)

What most folks don't understand is that all threads did was to lower the barrier to creating multi-tasking applications. It did nothing to make multi-tasking development any easier. Using multiple threads is still that order of magnitude more difficult than single threading.

I like to think of the emergence of threads like this. Imagine if a cheap, easy, and widely available means of creating weapons grade uranium were to be found. Everyone with a little bit of initiative could churn our kilograms of this material. Would this be good for the world? Would there be folks who have no business messing with this material be playing with it? Would there be some idiot trying to find a way to integrate it into a childs toy?

I would not want to go back to systems without threads, but it only made the job of software project management more difficult.