1. 13 Jan, 2015 1 commit
    • Paolo Bonzini's avatar
      coroutine-ucontext: use __thread · d1d1b206
      Paolo Bonzini authored
      ELF thread local storage is about 10% faster on tests/test-coroutine's
      perf/cost test.  The timing on my machine is 190ns per iteration with
      pthread TLS, 170 with ELF TLS.
      
      Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow
      the model of coroutine-win32.c (including the important "noinline"
      attribute!).
      
      Platforms without thread-local storage (OpenBSD probably?) will need
      a new-enough GCC for this to compile, in order to use the same emutls
      support that Windows already relies on.
      Signed-off-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: 's avatarFam Zheng <famz@redhat.com>
      Message-id: 1417518350-6167-2-git-send-email-pbonzini@redhat.com
      Signed-off-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      d1d1b206
  2. 17 Apr, 2013 1 commit
  3. 23 Feb, 2013 1 commit
    • Peter Maydell's avatar
      Replace all setjmp()/longjmp() with sigsetjmp()/siglongjmp() · 6ab7e546
      Peter Maydell authored
      The setjmp() function doesn't specify whether signal masks are saved and
      restored; on Linux they are not, but on BSD (including MacOSX) they are.
      We want to have consistent behaviour across platforms, so we should
      always use "don't save/restore signal mask" (this is also generally
      going to be faster). This also works around a bug in MacOSX where the
      signal-restoration on longjmp() affects the signal mask for a completely
      different thread, not just the mask for the thread which did the longjmp.
      The most visible effect of this was that ctrl-C was ignored on MacOSX
      because the CPU thread did a longjmp which resulted in its signal mask
      being applied to every thread, so that all threads had SIGINT and SIGTERM
      blocked.
      
      The POSIX-sanctioned portable way to do a jump without affecting signal
      masks is to siglongjmp() to a sigjmp_buf which was created by calling
      sigsetjmp() with a zero savemask parameter, so change all uses of
      setjmp()/longjmp() accordingly. [Technically POSIX allows sigsetjmp(buf, 0)
      to save the signal mask; however the following siglongjmp() must not
      restore the signal mask, so the pair can be effectively considered as
      "sigjmp/longjmp which don't touch the mask".]
      
      For Windows we provide a trivial sigsetjmp/siglongjmp in terms of
      setjmp/longjmp -- this is OK because no user will ever pass a non-zero
      savemask.
      
      The setjmp() uses in tests/tcg/test-i386.c and tests/tcg/linux-test.c
      are left untouched because these are self-contained singlethreaded
      test programs intended to be run under QEMU's Linux emulation, so they
      have neither the portability nor the multithreading issues to deal with.
      Signed-off-by: 's avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Tested-by: 's avatarStefan Weil <sw@weilnetz.de>
      Reviewed-by: 's avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: 's avatarBlue Swirl <blauwirbel@gmail.com>
      6ab7e546
  4. 22 Feb, 2013 1 commit
  5. 12 Jan, 2013 1 commit
  6. 19 Dec, 2012 1 commit
  7. 31 Jul, 2012 1 commit
    • Peter Maydell's avatar
      configure: Split valgrind test into pragma test and valgrind.h test · 06d71fa1
      Peter Maydell authored
      Split the configure test that checks for valgrind into two, one
      part checking whether we have the gcc pragma to disable unused-but-set
      variables, and the other part checking for the existence of valgrind.h.
      The first of these has to be compiled with -Werror and the second
      does not and shouldn't generate any warnings.
      
      This (a) allows us to enable "make errors in configure tests be
      build failures" and (b) enables use of valgrind on systems with
      a gcc which doesn't know about -Wunused-but-set-varibale, like
      Debian squeeze.
      Signed-off-by: 's avatarPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: 's avatarBlue Swirl <blauwirbel@gmail.com>
      06d71fa1
  8. 17 Jul, 2012 1 commit
  9. 17 Feb, 2012 1 commit
  10. 15 Dec, 2011 1 commit
    • Avi Kivity's avatar
      coroutine: switch per-thread free pool to a global pool · 39a7a362
      Avi Kivity authored
      ucontext-based coroutines use a free pool to reduce allocations and
      deallocations of coroutine objects.  The pool is per-thread, presumably
      to improve locality.  However, as coroutines are usually allocated in
      a vcpu thread and freed in the I/O thread, the pool accounting gets
      screwed up and we end allocating and freeing a coroutine for every I/O
      request.  This is expensive since large objects are allocated via the
      kernel, and are not cached by the C runtime.
      
      Fix by switching to a global pool.  This is safe since we're protected
      by the global mutex.
      Signed-off-by: 's avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      39a7a362
  11. 21 Aug, 2011 1 commit
  12. 08 Aug, 2011 1 commit
  13. 01 Aug, 2011 1 commit
    • Kevin Wolf's avatar
      coroutine: introduce coroutines · 00dccaf1
      Kevin Wolf authored
      Asynchronous code is becoming very complex.  At the same time
      synchronous code is growing because it is convenient to write.
      Sometimes duplicate code paths are even added, one synchronous and the
      other asynchronous.  This patch introduces coroutines which allow code
      that looks synchronous but is asynchronous under the covers.
      
      A coroutine has its own stack and is therefore able to preserve state
      across blocking operations, which traditionally require callback
      functions and manual marshalling of parameters.
      
      Creating and starting a coroutine is easy:
      
        coroutine = qemu_coroutine_create(my_coroutine);
        qemu_coroutine_enter(coroutine, my_data);
      
      The coroutine then executes until it returns or yields:
      
        void coroutine_fn my_coroutine(void *opaque) {
            MyData *my_data = opaque;
      
            /* do some work */
      
            qemu_coroutine_yield();
      
            /* do some more work */
        }
      
      Yielding switches control back to the caller of qemu_coroutine_enter().
      This is typically used to switch back to the main thread's event loop
      after issuing an asynchronous I/O request.  The request callback will
      then invoke qemu_coroutine_enter() once more to switch back to the
      coroutine.
      
      Note that if coroutines are used only from threads which hold the global
      mutex they will never execute concurrently.  This makes programming with
      coroutines easier than with threads.  Race conditions cannot occur since
      only one coroutine may be active at any time.  Other coroutines can only
      run across yield.
      
      This coroutines implementation is based on the gtk-vnc implementation
      written by Anthony Liguori <anthony@codemonkey.ws> but it has been
      significantly rewritten by Kevin Wolf <kwolf@redhat.com> to use
      setjmp()/longjmp() instead of the more expensive swapcontext() and by
      Paolo Bonzini <pbonzini@redhat.com> for Windows Fibers support.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: 's avatarStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      00dccaf1