Original Link: https://www.anandtech.com/show/1335



Introduction

The past year has been quite a ride. With the introduction of the Opteron, and later the Athlon 64, AMD has proven that it can stretch beyond just designing processors. As much as the original K7 architecture was a solid processor, AMD have really done something special with the Athlon 64 architecture. Creating a chip that performs well in current systems while taking a step past Intel into x86-64 support is no small feat.

Of course, that's not to say that everything has gone smoothly for AMD. Opteron and Athlon 64 were delayed before their initial release, and we didn't see parts until much later than we had expected. When the parts finally arrived, they performed very well, but the overclocking that AMD had been known for in the Athlon XP line was definitely lacking. To top it off, Athlon 64 was released with a single channel memory controller while it's big brother the Opteron had dual channel support (which is perceivably part of the reason the part was so much faster than the Athlon 64 line).

As a result, almost since its launch, enthusiasts have been waiting for Socket 939 to bring dual channel memory to the Athlon 64 line. In addition, the chipsets that will be powering 939 pin motherboards will be capable of a 1GHz Hypertransport bus (with PCI locks) hopefully giving them a little more stability and overclockability than the original Athlon 64 line had. On the desktop side, in the interim, we saw 512kB cache (cheaper mass market) revisions of the Athlon 64 bring us the 2800+ and 3000+ processors, which both performed very well for their price point. This worked well because the Athlon 64 isn't heavily pipelined and is less effected by cache than the Pentium 4 line of processors.

In addition to expanding into lower cost markets, AMD needed an ultra high end desktop part to show off its potential to the world. The FX-51 and FX-53 have really put AMD on top of the desktop market in terms of gaming performance, though these parts arguably don't have as much value (price to performance wise) as the cheaper but very highly performing Athlon 64 processors. Unfortunately, in order to introduce these enthusiast parts with dual channel memory very quickly, AMD essentially just tweaked and rebranded their opteron processor and made socket 940 another desktop platform.

Unfortunately, those who want the higher performing (and higher priced) FX processor also need to shell out more money for a higher end motherboard than needed and slower, more expensive, registered RAM. Moving to 939 will bring a single platform to the desktop and give users one less choice to have to make in their purchasing decisions.

One of the major issues with having multiple generations of processors with different memory controllers is that AMD has to be careful about not allowing CPUs with different memory controllers to fit into the sockets of unsupported motherboard. This means that every new generation of memory controller for AMD will bring a new socket to the market. Intel is able to be a little more agile in this area, as the memory controller is in the chipset. This is only an issue when bad decisions are made, such as when Intel decided to adopt RDRAM. They might not have been able to switch back over to DDR so quickly had they fabbed all their processors with a RAMBUS memory controller on the die.

So, today we are seeing the introduction of socket 939 for the AMD Athlon 64 and FX. The bottom line is that we are seeing the same VIA and NVIDIA chipset based motherboards with a different socket attached accepting processors with nothing new but a dual channel unbuffered memory controller. What exactly does this mean, and what kind of performance can we expect?



What's In A New Socket

With the introduction of the 939 platform, we will see a convergence of platforms on the mainstream and high end desktop market for Athlon 64. Until now, the decision between the mainstream Athlon 64 and the FX version of the processor brought with it the problems of choosing between registered memory for a dual channel platform originally targeted at the workstation market and unbuffered memory for a single channel platform. The 939 pin platform brings us the ability to use unbuffered memory (which is slightly faster, cheaper, and more available than registered memory) in a dual channel configuration with either an Athlon 64 or an Athlon 64 FX processor.

Not that any platform is (or ever will be?) future proof, but 939 will provide its adopters with a broader range of options for processors. We have already hit the upper limits of the 940 desktop platform in the FX-53 processor, and the 754 pin Athlon 64 will only reach one speed grade past the current high end 3400+ to the 3700+ (at least as far as current AMD plans indicate).

Of course, midrange and high end platform convergence doesn't mean that the 940 pin and 754 pin platforms will go away. We will continue to see 940 pin processors and platforms in their original market (workstation/server) sporting AMD's Opteron processors. The 754 platform will become the new home of the Athlon XP and AMD's value line of processors. The new Athlon XP will be a trimmed down, 32bit only version of the Athlon 64.

Aside from bringing together AMD's two current Athlon 64 lines of processors, the 939 also has a couple other benefits that are attractive to users. As we mentioned earlier, the new platform will offer support for unbuffered memory with a dual channel setup. Athlon 64 processors built for the 754 pin socket are limited to single channel memory support, and the 940 pin processors require registered RAM. As such, we should see a performance increase when moving to 939 from both directions: The FX processors will shed the added latency of buffered RAM, while dual channel support will add increased bandwidth to the mainstream Athlon 64 line.

There will be less improvement on the FX line of processors, but will either design increase significantly in performance from the enhancements made possible on the 939 platform? Based on the naming of the new chips being released, AMD seems to think so. Of course, that is ultimately what we are here to find out.

Let's take a look at what AMD is bringing out to plug into this socket.



The Chips

Launching on the socket 939 platform today are the Athlon 64 3500+, 3800+ and FX-53 CPUs. As we have mentioned in previous news articles, these new CPUs include the 3500+, 3800+ and FX-53. The 3500+ will run at 2.2GHz while the 3800+ and FX-53 will run at 2.4GHz each. Here's a comparison shot of a 940 pinout and a 939 pinout:


This is a socket 940 processor


This is a socket 939 processor

Aside from the difference in packaging, the only new thing about these processors is their on die memory controller. These parts are the first to be equipped with a memory controller that can handle dual channel unbuffered DDR memory. As we have seen, integrating a memory controller on die has been a successful way of maintaining performance for AMD, but the drawback is when AMD wants to make its processors work with a new type of memory: they need to redesign part of the silicon.

Here's a table that lays out the processors and their specifications adopted from one of our earlier roadmap articles:

Current AMD Athlon 64 and FX processors
  Clock Speed Cache Size Dual Channel Unbuffered
Athlon 64 FX-55 (939)
2.6GHz
1MB
Yes
Yes
Athlon 64 FX-53 (939)
2.4GHz
1MB
Yes
Yes
Athlon 64 FX-53 (940)
2.4GHz
1MB
Yes
No
Athlon 64 3800+ (939)
2.4GHz
512KB
Yes
Yes
3700+ (754) ???
2.4GHz
1MB
No
Yes
Athlon 64 3500+ (939)
2.2GHz
512KB
Yes
Yes
Athlon 64 3400+ (754)
2.2GHz
1MB
No
Yes

The questions are next to the 754 pin 3700+ part because we haven't seen one yet, and we didn't run any numbers for that particular configuration.

The 939 pin Athlon 64 parts we know about have 512kB L2 caches, this means that he 3500+ actually has less cache than the equivalently clocked 3400+ 754 pin CPU (the same is true when comparing the 3700+ 1MB part to the 3800+). If we are to expect performance to match the rating system, this means that AMD expects the addition of a dual channel memory controller to more than make up for a halving of the cache.

The slight name change in equivalently clocked parts has had us wondering for a while if we would see the expected increase in performance. With an increase in rating of 100 points for the 3400+ and 3700+, we would expect to maximally see 2.9% and 2.7% increases in performance. Anything around the 2% mark will be enough for us to be comfortable with the new naming scheme, but we certainly don't want to see too many lower numbers.

Normally in testing, we consider a less than 3% margin to be essentially equivalent performance, but this time around we will be paying a little closer attention to any small increases in performance in order to determine whether or not this new performance rating is deserved.

The new FX-53 part number has obviously not changed, as its model number is dependant upon clock speed. We will also not be seeing as significant a performance increase from 940 to 939 pin platforms as upgrading to dual channel from single channel will have a higher impact on performance than moving from registered to unbuffered memory. On the 939 platform, the only performance factor that separates the FX series from the rest of the Athlon 64 line will be its 1MB L2 cache size. Of course, to help maintain its status as an enthusiast part, the FX series will also be completely multiplier unlocked.

But there is a caveat to that as well. With the advent of AMD's Cool'n'Quiet (which is similar in affect to Intel's Enhanced SpeedStep), motherboard makers who choose to implement the technology will be able to offer their users downwardly unlocked multipliers for the Athlon 64 platform. Being able to decrease the multiplier is very important for hardcore overclockers as much higher bus speeds (and thus RAM speeds) are attainable when the core multiplier can be lowered.



The Test

We will be comparing four new processor speeds against the numbers we have already collected over the past few months. Two speed grades will be Athlon 64 (512kB L2 cache parts), and the other two will be FX parts (1MB L2 cache parts). One of the FX parts isn't actually being launched yet, but will be the future FX-55 part. Very fortunately, the FX processors are completely multiplier unlocked, so all I had to do to test FX-55 speeds was to crank up the voltage and multiplier on our FX-53 to 1.55V and 26. Worked like a charm, aside from the issues I experienced across the board.

 Performance Test Configuration
Processor(s):

AMD Athlon XP 3000+
AMD Athlon 64 3000+
AMD Athlon 64 3200+
AMD Athlon 64 3400+
AMD Athlon 64 3500+ (S939)
AMD Athlon 64 3800+ (S939)
AMD Athlon 64 FX53 (S939)
AMD Athlon 64 FX55 (S939)*
AMD Athlon 64 FX51
AMD Athlon 64 FX53
Intel Pentium 4 3.2GHz EE
Intel Pentium 4 3.4GHz EE
Intel Pentium 4 3.2GHz
Intel Pentium 4 3.0GHz
Intel Pentium 4 3.2EGHz

RAM: 2 x 512Mb OCZ 3500 Platinum Ltd
2 x 512Mb OCZ 3200 EL ECC Registered 2:3:3
2 x 512Mb Mushkin ECC Registered High Performance 2:3:2
Hard Drives Seagate 120GB 7200 RPM (8MB Buffer)
Video AGP & IDE Bus Master Drivers VIA Hyperion 4.51 (12/02/03)
Intel Chipset Drivers
Video Card(s): Sapphire ATI Radeon 9800 PRO 128MB (AGP 8X)
Video Drivers: ATI Catalyst 4.1
Operating System(s): Windows XP Professional SP1
Motherboards: Intel D875PBZ (Intel 875P Chipset)
FIC K8-800T (VIA K8T800 Chipset)
ASUS SK8V (VIA K8T800 Chipset)
MSI MS-6702E (VIA K8T800 Pro Chipset)

* the FX-55 part has not yet been released, but is on AMD's roadmaps.

In setting up the memory on our 939 pin MSI board, we made sure to disable 2T timing in favor of 1T, as memory bandwidth is greatly increased by doing so (and thus performance is impacted to a significant degree). Memory timings on the two unbuffered memory platforms were 2:2:2:6 using the OCZ RAM.

Testing these processors was a very difficult task, as I had a large number of stability issues. Winstone was run so many times just to get through the benchmark. We covered all the bases we knew how to cover, using a PC Power & Cooling Turbo Cool 510, a ThermalTake Venus 12 and 2 120mm case fans (on an open system) to make sure we had stable voltage supplies and adequate cooling. Nothing really seemed to make a difference until we noticed that the 3500+ and FX-55 benchmarks were "easier" to run. This seemed to indicate that the motherboard wasn't supplying enough voltage to the CPU, as the increased voltage added stability to the FX-55 and the 3500+ was just an underclock. This theory wasn't explored completely, as Computex beckons. To be fair, our own Wesley Fink tested a system that was completely and utterly stable from underclocks to overclocks and everything in between using the exact same versions of components across the board. The conclusion we have come to is that we had a "motherboard issue", though we wish we could be more specific. The important thing is that we got all the benchmarks done, and based on Wesley's experience and the time between now and availability, we don't expect there to be any of these kinds of issues. Of course, we'll definitely keep abreast of the situation.



SYSmark 2004

SYSmark 2004 Overall

When running at 2.4GHz, the FX architecture drives past everything else. We are seeing that the 939 pin FX53 is a little bit slower than the 940 pin version. This is not the norm for our benchmarks, as we will see, but the socket 940 solutions do manage to pull slightly ahead in some tests. Our theory on this is that the build quality of a 940 board is a little higher as it is intended to be a workstation or server platform. And, of course, we are using the very high performance SK8V board as our 940 platform. The 3500+ is outperforming the 3400+ as expected.

SYSmark 2004 Internet Content Creation

The content creation suite of SYSmark 2004 shows that the 3800+ and FX-53 939 parts are very similar in performance, meaning that cache size has little impact on this benchmark (as the only difference between those two processors is 512kB of cache).

SYSmark 2004 Office Productivity

We see the same trends in Office productivity that we saw in the content creation suite, with the exception of the fact that the FX-53 pulls a little further ahead of the 3800+ (showing that cache size is a little more important in this test.



Business and Content Creation

Business Winstone 2004

Business Winstone 2004 reveals a picture of performance showing all but the slowest socket 939 CPU on top of the rest of the pack. This benchmark shows the 3500+ falling a little bit behind the socket 754 3400+ processor.

Content Creation Winstone 2004

The Content Creation Winstone test reveals a different story than the SYSmark data. Here we see that the only socket 939 part to drop below a CPU from another platform is the 3500+ which drops a couple points below the socket 940 FX-53 (while still leading the 3500+ slightly).



DivX Encoding

DIVX Encoding

The AMD parts announced today still don't have the umph to push past the P4 EE 3.4 GHz processor in its favored benchmark: encoding. The socket 939 parts do a solid job of raising the bar a little bit higher for AMD this time around. All of these numbers make sense based on their performance rating, as equivalently clocked versions of the 939 platform show that the dual channel unbuffered memory controller adds a slight, but consistent, boost in performance to the Athlon 64 line.



DirectX 9 Performance

Aquamark 3

Aquamark

The frames per second score in Aquamark is very graphics card limited, but we do see the 939 parts performing where we would expect them to perform.

Aquamark

The Aquamark CPU score shows that the new platform does well help out behind the scenes with very high CPU performance, but those numbers just don't translate into big frames per second yet.

Gunmetal

GunMetal

Even though Gunmetal only incorporates VS2.0, that still puts it in the category of a DX9 benchmark. Everything here is graphics card limited even at 1024x768, but perhaps in the near future when we upgrade the video card we use in CPU and motherboard tests, we'll see some of the impact the rest of the system really has on these games. Of course, Even though the range is really tight on these benchmarks, the 939 pin CPUs manage to sweep top honors.



DirectX 8 Performance

Unreal Tournament 2003

Unreal Tournament

The flyby benchmark shows the very same moderate but consistent performance gains we have been seeing through out most of this test suite. The 939 pin CPUs are conglomerated near the top of the pack with the occasional visit from the odd socket 940 processor.

Unreal Tournament

In the more highly limited botmatch benchmark, not much changes at all, though the 3800+ slips in just behind the socket 940 FX-53 this time. Again, the 3500+ has a comfortable margin over the 3400+.

Warcraft III: The Frozen Throne

Warcraft 3

This time we see the socket 939 parts all huddled near the top with the exception of the 3500+ which is barely ahead of the 3400+ in this benchmark. Before anyone asks, VSYNC was not on in this benchmark (and besides, we were running 75Hz).



OpenGL Performance

Quake III Arena

Quake III Arena

As has been mentioned before, these Q3A numbers are so high because I had been using vertex lighting which shows slightly different characteristics than lightmap lighting. In a separate 939 motherboard review, we will be seeing Quake III numbers using the lightmap settings. In this case, however, we can see that the P4 EE parts are again raining on the AMD parade, though the socket 939 CPUs' performance is nothing at which to scoff.

Wolfenstein: Enemy Territory

Wolfenstein: Enemy Territory

This ET benchmark uses the radar file from 3d center with the settings on the default for high quality graphics. Here we see the 939 processor performance much higher than the counterparts on other platforms. The "FX-55" just seems to jump ahead even at the top of the performance heap.

Jedi Knight: Jedi Academy

Jedi Knight: Jedi Academy

Jedi Academy seems to hit a video card bottleneck at the performance levels pushed by these processors. Our benchmark suite is based on the Radeon 9800 Pro 128, but an NVIDIA card might have been a better card to show this (or any other) OpenGL based benchmark as NVIDIA cards traditionally perform better under OpenGL.



3D Rendering

3DStudio Max R5

3DStudio

This benchmark shows the 940 pin FX processors are able to handle 3D Studio rendering a little bit better, and not even the uberclocked FX-55 can reach into the rendering performance of the P4 EE CPUs here. We are again going to chalk this up to a motherboard design issue as the specs just can't explain the performance here.

Lightwave 7.5

Lightwave

Again, another rendering benchmark, another indication that 940 is faster for 3D content creation type applications. It may only be a by-product of the SK8V, but that doesn't speak poorly of anything. We also see that in this benchmark the 3500+ is a tenth of a second slower than the 3400+. Not a huge difference, but slower is not what we want to see from a higher rated model.



Development Workstation Performance

Quake III Arena Source Compile

Interestingly, removing the latency of the registered RAM from the socket 940 based FX processors really helps speed up compile time. Here we see a whole second increase in compile time for the FX-53 processor (which translates to a very long time when only talking about 13.4 seconds to being with). This is interesting because we would expect most of the data we are compiling to be in the L2 cache after the first couple times we compile it, but apparently not even 1MB of cache is enough keep the memory interface from having an impact on performance.



Comparing Sockets: 939 vs. 940 vs. 754

To get a clearer idea of exactly what Socket 939 brings to the performance table, we brought some real test results to some of the speculation that has been brewing on the web as to whether 939 is really faster than 754. We suspect our objective tests of Socket 939/940/754 will fly in the face of some of the absurd speculation and sloppy test results that are being posted about the new Socket, but the truth is rarely as exciting as controversies created to stand out from the crowd.

The comparison is simple - there are the 3 sockets that all have processors that can run at 2.2Ghz. To keep the comparison as fair as possible we tested the 3 sockets -754, 940, and 939 with 3 processors with 1MB of on-chip cache all running the same 2.2GHz speed . This gave a head-to-head comparison of the single-channel memory controller of Socket 754 to the Dual-Channel Registered Memory of 940 to the latest Dual-Channel Unbuffered Socket 939.

Sockets were compared using the standard motherboard test suite to give a broad comparison of performance. General Performance was compared using Veritest Multimedia Content Creation 2004 and Business Winstone 2004. results were also compared in PCMark 2004.

General Performance - 2.2GHz & 1MB Cache

General Performance - 2.2GHz & 1MB Cache

Winstones are usually very static at a given CPU speed on a processor. Even wide variations in memory bandwidth and graphics performance rarely have much impact on the Winstone scores. The increases in Winstone scores were only 2.6 to 3% from 754 to 939, but the pattern was very consistent with 939 the fastest, 940 close to 939 and 754 slower than either socket for dual-channel memory. PCMark 2004 was an even more modest spread of 1.4% from slowest to fastest 2.2GHz.

High End Workstation Performance - SPECViewperf 7.1.1

Workstation performance is more sensitive to memory bandwidth, and we do see a wider range in variation among the 2.2GHz processors in SPECviewperf. 754 to 939 ranged from +6.5% in UGS to an 11.4% increase in DX. Considering the CPU's are all the same core at 2.2GHz this is a wide variation just from different memory controllers. The pattern was the generally the same fastest to slowest of 939-940-754, except 940 did outperform 939 in a couple of the SPECviewperf benchmarks.



Comparing Sockets: Gaming Performance

DX9 and Media Encoding benchmarks confirm 939 is the fastest CPU at 2.2GHz.

Gaming Performance - DX9 and Encoding

2-pass Media Encoding was about 6% faster on 939 than 754 at the same speed. Direct X 9 games were closer in performance among the 3processors, but still showed a consistent 754-940-939 pattern from slowest to fastest.

Gaming Performance - DX8 and OpenGL

Quake 3 and other games based on the Quake Open GL engine are sensitive to memory bandwidth variations. So it was not a surprise to see Quake 3 increase in performance a bit over 6% from 754 to 939. Across all DX8 games 939 again came out as the top performer.

We did not expect large improvements in performance as Athlon 64 moved from 754 to 939. Since we have found the performance of the Dual-Channel Socket 940 and the Single-Channel 754 to be close when they ran the same speed with the same cache, it was already clear the Athlon 64 was not an architecture that was starved for memory bandwidth like the 'deep-pipes' Pentium 4 design. When P4 went dual-channel the performance improvement was dramatic. Athlon 64 shows more modest increases in performance, but that performance increase is still real and measurable. Dual-Channel 939 is the fastest Athlon 64 socket, followed closely by 940. It appears that the reduced latency of unbuffered memory actually does translate into slightly improved performance for 939. Socket 754 is slower than either of the DC solutions, but the difference between fastest and slowest among the 3 sockets is still relatively small.



Comparing CPUs: 3400+ and 3500+

There have been plenty of rumors trickling out from around the globe that seem to indicate that the 3500+ is a slower processor overall than the 3400+. Of course, answering the question of whether or not the new naming scheme is simply marketing distinction for the new socket, or an actually deserved rating is a question we have strived to answer through these tests. If we step up and take a look at most of the benchmarks we ran, we will see these percent differences:

As we can see, six or seven of the benchmarks are at or around the 2% mark we were looking for in calling this part deserving of its performance rating. Most of the other benchmarks still show an increase in performance over the 3400+ even if its not as much as we would like to see, and only two benchmarks show a decrease in performance. There is a good mix of games, encoding, and compiling (and the content creation winstone is close enough) that show the increases we would expect, and things like DX9 games (graphics limited) and 3D rendering don't always scale the way we would expect. It seems that Lightwave and Business Winstone are very sensitive to cache size, in spite of the increased memory bandwidth provided by the dual channel memory interface.

When all is said and done, it is clear that the 3500+ is a better performer than the 3400+ on average. But what else could AMD have done, call it a 3450+? Well, maybe their still holding on to that card for a reason, and maybe their tests show that the 2.2GHz 512kB caches dual channel unbuffered CPU really does deserve a rating of 3500+. There is really not enough data to point toward the 3500+ not living up to its name to get upset with AMD about the rating number.

It is our opinion that the 3500+ is solid performer that is at least not undeserving of its name. And we have a good feeling that overclocking performance may also help to seal the deal, but we'll have to wait on a final verdict in that arena until we actually get our hands on a 3500+ and aren't reduced to underclocking a 3800+.



Final Words

So the verdict on 939 is that it isn't a revolutionary performer, and it won't bring peace to world. But socket 939 is really the finishing touch and final polish that the Athlon 64 line has been waiting for.

We have been waiting for this socket for a long time now, and if we lived in a perfect world, we would have seen a socket 939 like solution (with dual channel and all desktop Athlon 64 processors on one platform) from the beginning. Of course, now that its here, we have reason to rejoice. Socket 754 will become home to the new value line of processors as the current generation of Athlon XP processors fades into the sunset, and 940 pin platforms will still be used for Opteron servers and workstations.

We keep hearing rumors of an Opteron for 939, but we aren't exactly sure why something like that makes sense. Registered and ECC memory support are very important for server and workstation class systems. Stability is the most important factor in such platforms, and taking away such a big part of the equation really doesn't seem logical.

In the final analysis, current socket 754 and socket 940 users won't see gain any real value from "upgrading" to socket 939. The new addition of a dual channel memory controller for unbuffered DDR has no doubt given the Athlon 64 line a small performance increase, but it may not be as much as people had been expecting. The main advantages to socket 939 will be the convergence of the Athlon 64 desktop platform, the ability to use unbuffered RAM in conjunction with high end desktop processors, and the warm feeling that comes from knowing there's quite a lot of memory bandwidth under the hood with a dual channel memory controller on die.

The real reason we aren't seeing more intense performance increases from socket 939 is the same reason we don't huge performance differences between Athlon 64 processors with different sized caches (at least we don't see the variance we see among Pentium 4 based processors): the Athlon 64 is not an incredibly deeply pipelined architecture, and cache misses that result in pipeline stalls don't cause the processor to waste much of its time refilling the pipeline (as is the case with Intel's Netburst architecture in low cache situations). Really, the added bandwidth of dual channel is able to more than make up for the loss of 512kB in cache.

The socket 939 FX-53 absolutely takes the cake in terms of performance (though price will still be a barrier to entry, and an Athlon 64 processor will be a much better value). We are happy with the new line of Athlon 64 processors.

In the final analysis, we aren't talking about the be all end all of platforms and performance, but, certainly, anyone who wants an Athlon 64 system should look no further than socket 939 for its flexibility, overclockability, and performance.

Log in

Don't have an account? Sign up now