Integrating M5 and GEMS
We're having a coding sprint on January 13, 2009. The sprint begins at 9AM PST/11AM CST. We will begin with a phone call on Nate's conference line and use IRC throughout the day.
To get a "working" unified simulator by the end of the day.
- Unified build environment using scons -- Arka w/ Nate and Steve supervising
- Support system call emulation mode
- Support full system mode
- atomic support, especially load locked/store conditional -- Derek
- pio support
- Deal with lack of first-class data support in Ruby
- Configuration management
- Testing infrastructure base on m5 infrastructure
- Which tests?
- What run modes to support? A fast Ruby-less or Ruby-lite mode?
- Develop detailed list of future tasks
- Nathan Binkert (All Day)
Long Term Tasks
- Data in caches
- I don't see a huge need for this, but if it is forced on us, it'll be good to have a junior grad student (JGS) do it. Basically we have to revive the old DATA_BLK flag, which worked in the 'research tree' when I joined the group. I've turned it on for my own reasons in the past, discovered it was broken, and put no effort into fixing it.
- Python configuration adaptation
- M5 uses a very different configuration system than Ruby, and Ruby's was based heavily on the Simics CLI. I have a hack in place, so that it is possible to change some Ruby parameters at runtime (any at compile-time), but that should be temporary -- we should switch entirely to M5's style. It will little more than a lot of grep'ping and such, but again its a good familiarization excercise. At the same time, we can prune some fat from the configuration parameters.
- M5 fast/timing mode support
- Timing mode is Ruby's normal operation. Its possible that we can use 'fast' mode to warm caches (as we currently do from gzipped traces). This will require some new coding here and there, as well as testing.
- Atomic support
- This may be important for us in the long run. We'll do some kind of horrible nasty hack in the sprint, but we'll want something flexible, generic, and elegant in the long run. We'll need a clever JGS to make that happen. This will actually be quite challenging to integrate into existing protocols, as we need something like an M-locked state to really get the timing right. At the same time, we might also implement true write merging (read-merge-write timing), as an option (the other option is subblocked caches with per-subblock ECC).
- Timing of uncached accesses
- Basically, we need add an 'isUncacheable' flag to network messages and modify SLICC to generate cache controllers that ignore messages with the isUncacheable flag set. That should effectively force the messages to traverse their normal miss path. As an optimization (depending on interconnect, etc.), we can add special routing capability to move straight to the memory controller and/or off-chip bridge. If we *add* some notion of an off-chip actor, that is.
- Fix Directory Memory
- DirectoryMemory.C implements 'generic' directory data by allocating permanent directory state for all cache blocks in the physical memory. This is fine, except it is stored as an array of DirectoryEntry*. The size of that array is MAX_ADDRESS / CACHE_LINE_SIZE_BYTES. That, too, is fine, except when the physical memory space isn't contigous. E.g. when addresses start at 0, run to 0x0ff, then resume at 0x100000 through 0x1000ff. It happens -- for instance when we simulate really large machines with huge complex backplanes. It /can/ happen in M5, too. Ruby also seems to leak memory because of this... even though its not really a leak. The solution is to move the whole data structure from array-based to some kind of balanced tree. Its been on the list of things to do for a long long time.