Finer Granularity
Once you have a lock around everything, the next step is to try to move things out of the lock. In a kernel, for example, you can interact with the filesystem and networking subsystems independently. The network stack creating a TCP/IP packet from a stream of bytes for one process shouldn't affect a request from another process to load some data from disk. You can add two locks, one for each subsystem, and have the relevant calls hold the subsystem lock only while they perform work entirely on that subsystem.
This approach has the nice advantage that it can be done incrementally. For example, you might start with a lock for the entire networking subsystem, and then split that giant lock into one lock for each of the layers in the stack. You can also subdivide horizontally; for example, having one lock per connection per layer of the stack.
The more locks you have, the higher the potential for concurrencybut also the more time you'll spend acquiring and releasing those locks, which is a relatively expensive operation. It's very easy to get carried away and discover that your code is now happily spread across a 64-processor machine, with each processor spending 90% of its time acquiring and releasing locks.
There's no benefit from having two locks if you need both of them in order to do anything. Try to make your locks as independent as possible. Ideally, you should never need more than one lock at a time.