Introduction

 

The initial or 1st phase of the MTCC has completed.  During this phase the magnet was ramped several times to different flat top values up to the full 4 Tesla.  While the magnet was ramping and running at constant current many global, local and calibration runs were taken by the CSC.  For this exercise the CSC LV was supplied by two Weiner Maraton supplies, which are magnet field tolerant.  An important aspect of this running was to understand the performance of the LVPS in the magnetic fringe field. 

 

There were a number of start up problems that resulted in PS trips and unexplained behavior.  Steps were taken to monitor and log the available monitor data from each supply and that available from the Peripheral Crate Monitor Boards, PCMB, installed in each P crate.  This data was crucial in understand and observing the behavior of the supplies. 

 

History

 

Under the original plans the 2 Maraton supplies were to supply the LV for only 2 of the 4 Trigger Sectors, TS, in the test.  The other supplies were not received so the 2 supplies were connected so as to power 2 TS each. 

 

 

Details

 

Two Wiener Maraton LVPS were used to power the CSC LV and the PC LV during the 1st phase of the MTCC.  One LVPS, Maraton #1 or M1, powered station ME+1.  The second LVPS, Maraton #2 or M2 powered stations M+2 and ME+3.  Because of the geometry each supply had identical loads and powered 9 CSC and 1 PC.  The channels assignments were as follows:

 

Supply                       _1  First half                                     _2  Second half      

Maraton Channel    Ch1          Ch2          Ch3                Ch4          Ch5          Ch6

Output current          150A         50A          100A               150A         50A          100A

Load                           Digital_1  PC_1        Analog_1       Digital_2  PC_2        Analog_2

Maraton #1

  Voltage monitor      M1V1        M1V2        M1V3              M1V4        M1V5        M1V6

  Current monitor      M1A1        M1A2        M1A3              M1A4        M1A5        M1A6

Maraton #2

  Voltage monitor      M2V1        M2V2        M2V3              M2V4        M2V5        M2V6

  Current monitor      M2A1        M2A2        M2A3              M2A4        M2A5        M2A6

 

Table 1. Output channel assignments for the two supplies.

 

 

M1_1

ME+1/11

M1_2

ME+1/10

M2_1

ME+2/5

M2_2

ME+3/5

 

Table 2. Trigger sector assignments for the two supplies.

                       

Analysis Files

 

The analysis results from the log files are available on the web at the directory listed below.  The analysis was done within XLS files.  One file for each calendar day.

 

Web Directory for results:

https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS

 

The XLS files for the Maraton log files analysis are named for example as:

 

maraton_data_pvss_08_28_2006.xls

where the number is the month, date, and year.

 

The XLS files for the PCMB log files analyzed are name for example as:

 

pcmb_data_pvss_08_22_2006.xls

 

Analysis and Timeline

 

Date                Magnet Status    Monitor Status             LVPS Status                   Comments

03-Aug                                         none                              M1 trip                              first recorded trip

04-Aug                                                                                Trips on M1 & M2           the start of a period of many trips

14-Aug  M

15-Aug  T                                     Maraton 1 log files

16-Aug  W                                                                          Notice and correct over voltage set point

17-Aug  T       ON 06:00 to 14:00 to 17,500                   M1 – Type 1 glitch

18-Aug  F                                    3 Maraton log files + PCMB log files

 

19-Aug  S                                                                           no trips

20-Aug  S                                                                           no trips

 

21-Aug  M                                                                           M2 – Type 1 glitch

22-Aug  T                                     Maraton new format with status

23-Aug  W                                                                          no trips

24-Aug  T       ON 07:00 to 11:30                                                                               no trips

25-Aug  F                                                                           no trips

 

26-Aug  S      On 07:00                                                     may be problems

27-Aug  S      On all day                                                   both supplies off 09h36

 

28-Aug  M      magnet trip at 13h35                                M1A2 trips at 13h18

 

 

Day by Day details

The following section includes data extracted from the ELOG.  These files were posted by several people.  .

 

3-Aug

 

First recorded trip in ELOG

103  

Thu Aug 3 15:17:03 2006

Martin von der Mey, mey@mail.cern.ch

LV

Marathon01 tripped

Marathon01 tripped while smooth running. Only
activity going on was reading out TMB VME
counters.

 

4-7  Aug

 

Multiple trips from 4th through the 7th, of both Maratons.  Now we see the PFC module tripping also.  The only indication we have for this is lost communication with the supply monitoring, and the green light on the toggle switch on the front of the PFC is OFF.

  111  

Fri Aug 4 15:53:04 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton01 tripped again

Maraton01 tripped again while smooth running
and humidity sensor #2 went out of range.
(ERROR)

 

 

 

 

 

 

  109  

Fri Aug 4 12:15:09 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton1 tripped

Maraton1 tripped while init and it did again
after powercycling.(Channels 2 and 5)

 

 

 

 

 

 

  107  

Fri Aug 4 10:14:39 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton02 tripped

Maraton02 tripped again while ramping up system

  106  

Fri Aug 4 01:37:43 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton2 tripped

Maraton2 tripped while downloading firmware
to Me1a (that is supplied from Maraton1 (?))

122  

Mon Aug 7 13:36:57 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton 1 tripped

the maraton unit of ye+1 tripped as well

  121  

Mon Aug 7 13:29:55 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton2 tripped

returned from lunch, finding the ye+2 ac/dc
tripped again.

  120  

Mon Aug 7 12:30:35 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton2 tripped

third trip of AC-DC

  119  

Mon Aug 7 12:23:48 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton2 tripped

we've seen a second trip of the YE+2 AC-DC
converter. waiting for documentation on this
unit, but it appears that the green power

  118  

Mon Aug 7 11:51:45 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton2 tripped

apparently the AC-DC converter which powers
Maraton 2 switched off (green button was
back in the off position). we need to monitor

  117  

Mon Aug 7 11:44:35 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton2 tripped

Maraton2 tripped.

  116  

Mon Aug 7 11:10:55 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton2 tripped

Maraton2 tripped while init maraton1.

  115  

Mon Aug 7 10:20:29 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton1 tripped

Maraton1 tripped. No activity.

  114  

Sun Aug 6 15:02:55 2006

Levchenko, Petr.Levchenko@cern.ch

Configuration

Maraton update

In order to improve Maraton performance jumpers
between 4.5V and 6.6V in CRBs of PC1,2 on
+YE1,2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10-Aug

 

This is an over current on the PC backplane channel.

 

 136  

Thu Aug 10 11:59:52 2006

Martin von der Mey, mey@mail.cern.ch

LV

Maraton1 tripped

Maraton1 tripped. (channel2=53A)

 

11-Aug

 

More trips. 

 147  

Fri Aug 11 12:33:27 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 tripped

maraton #1 trip, this time seemingly triggered
by a TMB status request from the hyperdaq
page ... pretty much anything seems to be

  146  

Fri Aug 11 12:22:16 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 tripped

4th trip ... we are considering to reconfigure
and possibly leaving one crate on maraton
#1 out. until then we do not participate

  145  

Fri Aug 11 12:15:45 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 tripped

3rd trip of maraton #1. 
 after initializing crate me1a and during
me1b initialization (CCB hardreset).

  144  

Fri Aug 11 12:11:56 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 tripped

repeated LV related problems, esp. seen on
the me1a crate.
followed by another Maraton #1 trip. (magnet

  143  

Fri Aug 11 11:45:23 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 tripped

maraton channel #2, during data taking for
run 2264 after ~15k events. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  154  

Fri Aug 11 19:38:16 2006

Dan Holmes, dan.holmes@cern.ch

Trigger

trigger config 11th Aug

summary:

csc ran with a sector 10 (csc 5a)  ME1/3

  153  

Fri Aug 11 18:00:54 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 trip (post-field)

2nd trip of maraton #1 ... no action or configured
activity on the peripheral crates. bringing
it back online again.

  152  

Fri Aug 11 17:28:38 2006

Frank Geurts, frank.geurts@cern.ch

LV

maraton #1 trip (post-field)

maraton #1 tripped again (no B-field this
time)

 

 

 

14  Aug

 

More trips and M3 is installed.  But this supply was not used.

 

 

162  

Mon Aug 14 15:29:17 2006

Frank Geurts, frank.geurts@cern.ch

Other

Maraton #1 trip.

trip #4 and #5 happened during lunch

  161  

Mon Aug 14 14:05:33 2006

Frank Geurts, frank.geurts@cern.ch

Other

Maraton #1 trip.

3rd trip.

  160  

Mon Aug 14 13:55:09 2006

Frank Geurts, frank.geurts@cern.ch

Trigger

run 2290, trigger synch only

trigger synch run 2290(no csc daq data) got
aborted after dt lv troubles.

  159  

Mon Aug 14 13:13:23 2006

Frank Geurts, frank.geurts@cern.ch

 

LV temporarily down

all LVs back up. preparing for a (non-daq
data).

  158  

Mon Aug 14 10:03:04 2006

Frank Geurts, frank.geurts@cern.ch

 

LV temporarily down

all LV will be down for the next ~1hr to allow
petr to install the water cooling for maraton
#3

  157  

Mon Aug 14 10:01:53 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton LV plans

 Maraton LV plans for today/tomorrow:

- enable water cooling to the water-cooled

  156  

Mon Aug 14 09:57:47 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton #1 trip.

2nd trip on maraton #1 today.

  155  

Mon Aug 14 09:22:19 2006

Frank Geurts, frank.geurts@cern.ch

LV

Maraton #1 trip.

maraton #1 tripped within 10min after cold
power-up (no crate activity).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

15-Aug  T

 

15/8  18:04:17

repaeated maraton #2 trips within minutes after power-up. as of now we consider this unrecoverable and leave the unit out. DO NOT SWITCH IT ON ANYMORE until maraton #3 is in place to take over the peripheral crate channels.

at the end of the day we will prepare the CSCs such that stations me+2 and me+3 can be used again.

 

 

15/8  23:31:09

as recorded by Valery's DCS, a log file showing more details some of today's maraton trips.

columns: data, time, channel number, voltage, current

 

08/15/2006 15:03:07    1   7.9400    81.3300

08/15/2006 15:03:07    2   6.9600    42.8400

08/15/2006 15:03:07    3   7.9900    22.9500

08/15/2006 15:03:07    4   8.0100    81.8200

08/15/2006 15:03:07    5   6.9300    43.1800

08/15/2006 15:03:07    6   8.0200    22.6300

08/15/2006 15:03:29    1   8.1200    49.5500

08/15/2006 15:03:29    2   1.5400    0.0000

08/15/2006 15:03:29    3   8.0200    4.3300

08/15/2006 15:03:29    4   8.1600    0.4700

08/15/2006 15:03:29    5   1.8100    0.0000

08/15/2006 15:03:29    6   8.0400    0.0000

08/15/2006 15:03:29    1   8.1200    49.5500

08/15/2006 15:03:29    2   1.5400    0.0000

08/15/2006 15:03:29    3   8.0200    4.3300

08/15/2006 15:03:29    4   8.1600    0.4700

08/15/2006 15:03:29    5   1.8100    0.0000

08/15/2006 15:03:29    6   8.0400    0.0000

08/15/2006 15:04:05    1   7.9400    81.3300

08/15/2006 15:04:05    2   6.9600    43.1600

08/15/2006 15:04:05    3   7.9900    22.9500

08/15/2006 15:04:05    4   8.0100    57.9000

08/15/2006 15:04:05    5   6.9500    42.7900

08/15/2006 15:04:05    6   8.0100    15.0800

 

This looks like a real trip because the voltages have not gone to 0 and the currents have dropped but are not identically 0.  Curiously though, it is only off for one read, at most 58 seconds.  Why did it go back on?  This could have been a quick power cycle by the operator, but no mention of it is found in the log. 

 

There is another trip at 15:20:45 that lasts until 15:28:45.  We do not have the status bits but it looks like a trip because the analog read backs all freeze during this time.  The current read backs usually fluctuate by a few LE bits but for a trip the exact same value is read until the supply is reset. 

 

There are other trips at 15:33, 15:41 and 15:46 at which point the supply is left off.  Again these look like actual trips because most of the values drop and all are frozen.

 

It should be noted that the voltage for M1V6 is normally at 8.01 or 8.02.  On the trip in the table above this voltage value is 8.04, and for ALL of the other trips mentioned above the voltage value is 8.06.  This looks like strong evidence for what we found with Urs the next day.  Please see the 16-Aug entry just below.

 

16-Aug  W

 

16/8  21:39:19

Urs came to SX5 this afternoon and he, Petr and I (Fred) worked on the Maraton LVPS. We disconnected the CANbus from DCS and connected directly to the supplies via a utility program on laptop. With this program we determined that the control parameters for Maraton 1 (the one that was tripping) were different than those for Maraton 2. Maraton 1 was tripped OFF before we started so we read the error bytes and determined that channel #7 caused the trip. Channel #7 on the internal setup corresponds to channel #6 on the DCS page, the analog voltage for the second half of the supply. We also noted that the overvoltage trip for this channel was at 8.00V and the voltage set point was also at 8.00V. For none of the other channels on this supply nor the other supply was the overvoltage trip set so close to the voltage set point. In fact the other 3 similiar channels on both supplies were each set at 8.40V. Urs theorized that this channel could run with the overvoltage set so close, but any changes in current could move the voltage output and cause a trip. We set this channel's overvoltage trip to 8.40V as well. We also looked at all the other parameters on both supplies and found them OK. We also spent some tome looking at Maraton #3 in case it becomes necessary to use it as well. The parameters were checked, and it was run and the output voltage values calibrated. It should be ready to be used. All we can do now is run with Maraton #1 and see if there are further trips. Based its future performance we can decide what to do next.

 

From this point on we set the over voltage trip well above the voltage setting for each channel.

 

17-Aug  T

 

17/8 09:53:43

Maraton1 tripped.

 

17/8  12:30:59

we have observed both me1a and me1b crates to loose their settings, usually a tell-tale of a LV glitch.   To prevent the (unconfigured) me1 crates from interfering with the CSC trigger Dan has removed the MPC inputs to the SP

 

17/8  13:19:52

Just before noon Maraton #1 get 30 second LV power glitch at 08/17/2006 12:48:15.
Please see attacment?

 

+YE1 Maraton#1 glitch

Date             Time       Ch    Volt      Currant

 

08/17/2006 12:47:50    1   7.9400    72.4400

08/17/2006 12:47:50    2   6.9600    43.4800

08/17/2006 12:47:50    3   7.9900    20.7000

08/17/2006 12:47:50    4   8.0100    72.4900

08/17/2006 12:47:50    5   6.9500    43.2600

08/17/2006 12:47:50    6   8.0200    20.8600

08/17/2006 12:48:15    1   0.0000    0.0000      ???    Reason Unknown?

08/17/2006 12:48:15    2   0.0000    0.0000      ???

08/17/2006 12:48:15    3   0.0000    0.0000       ???

08/17/2006 12:48:15    4   0.0000    0.0000       ???

08/17/2006 12:48:15    5   0.0000    0.0000       ???

08/17/2006 12:48:15    6   0.0000    0.0000       ???

08/17/2006 12:48:20    1   8.0000    81.5600

08/17/2006 12:48:20    2   6.9400    43.1300

08/17/2006 12:48:20    3   7.9900    46.9700

08/17/2006 12:48:20    4   7.9700    81.8800

08/17/2006 12:48:20    5   6.9500    43.6500

08/17/2006 12:48:20    6   8.0100    46.2500

 

17/8  13:57:33

reconfiguring the me1 crates (daq and trigger readout remains disabled, see dan's earlier entry) ... waiting for another glitch. Fred** will analyze the maraton slow controls log files to look for any obvious glitches (he observed those in a previous analysis in which one of the channels dropped to ~3V for a few minutes)
... i added the "LV" category: please file any maraton related entry with using this topic type

** See report on 15-Aug

 

This is the first capture of a type 1 glitch. 

 

During this day we see three instances of this glitch, at 10:15, 12:28 and at 14:18.  We have no status bit information.  All three times many of the current  read backs jump up for a few reads, and then we get one read where all values read back are exactly 0. 

 

 

18-Aug  F

 

18/8  21:45:54

Since last evening Valeri has had logging for all three Maraton supplies working. They were turned on at about 9:30 this morning by Martin and turned off at 19:40 this evening. The analysis of the log files are complete and show no trips or other anomolous behavior during the day.

 

 

19-Aug  S

 

19/8  07:59:06

Turned on Maraton 1, 2, 3. checked that turbines were running in PS and PC racks (4). Checked that supplies were on in DCS, and checked that the log files were getting fresh data showing supplies on. I also checked that the PCMB log file was getting fresh data and the values were OK for a PC that is on. Armando will oversee the PS today.

 

20-Aug  S

 

20/8  08:12:41

Turned on Maraton #1 and #2. Checked with the DCS pannels, everything is OK. Visually inspected all four crates, OK, Turbines are on. Data logging of Maraton and PCMB both show supplies are on and look nominal.
Note that I did not log that I turned the supplies off Saturday. After turning them on Saturday morning I logged off the DCS machine instead of locking it. It has been a while since I last used the puter and logged out by habbit instead of locking it. This killed the panels and Armando could not use them all day. We talked in the evening and I called Valeri, who went to the GB and re-established all the DCS pannels. I just locked and unlocked the screen, everything works when you do it write.

 

21-Aug  M

 

21/8  18:50:42

 

Discovered a glitch on Maraton #2 (YES 2!!!)at about 16:37.
We see that all voltages and current read backs from the Maraton are identically 0 for exactly one read.

16:35:56 M2 OK
16:37:03 M2 all 0's
16;38:06 M2 OK

We also see that the voltges to PC 3 and 4 (supplied by Maraton #2) go to zero.

16:35:55 all ok
16:36:35 PC 3 & 4 all 0's
16:37:15 Many 0's
16:37:55 OK

The concurrent 0's in both log files indicates this is a real glitch not an artifact of the readback. The fact that this occurred in Maraton #2 today and Maraton #1 on Friday, strongly indicates that this is a systematic problem in the supplies, not a single faulty supply. The fact that no problems were seen on Saturday or Sunday hints that this problem may be correlated to 'other' activity in the system, in the GB ...

Copies of the two logging files follows.
Maraton logging file.

08/21/2006 16:35:56 1 7.9400 72.6600 | 8.0000 79.0600 | 7.9400 0.0000
08/21/2006 16:35:56 2 6.9600 43.3200 | 6.9400 43.6000 | 7.8900 0.0000
08/21/2006 16:35:56 3 7.9900 35.3100 | 7.9900 46.9700 | 7.9600 0.0000
08/21/2006 16:35:56 4 8.0100 72.9700 | 7.9700 82.1200 | 7.5500 0.0000
08/21/2006 16:35:56 5 6.9600 42.9500 | 6.9500 43.6500 | 7.5600 0.0000
08/21/2006 16:35:56 6 8.0200 35.4700 | 8.0100 46.2500 | 7.5400 0.0000
08/21/2006 16:35:56 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:35:56 crate=2 1 0 1 0 0 0 1 -32545 1
08/21/2006 16:35:56 crate=3 1 0 1 0 0 0 1 223 1
08/21/2006 16:37:03 1 7.9400 72.6600 | 0.0000 0.0000 | 7.9400 0.0000
08/21/2006 16:37:03 2 6.9600 43.3200 | 0.0000 0.0000 | 7.8900 0.0000
08/21/2006 16:37:03 3 7.9900 35.1500 | 0.0000 0.0000 | 7.9600 0.0000
08/21/2006 16:37:03 4 8.0100 72.9700 | 0.0000 0.0000 | 7.5500 0.0000
08/21/2006 16:37:03 5 6.9600 43.2600 | 0.0000 0.0000 | 7.5600 0.0000
08/21/2006 16:37:03 6 8.0200 35.4700 | 0.0000 0.0000 | 7.5400 0.0000
08/21/2006 16:37:03 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:37:03 crate=2 1 0 1 0 0 0 0 -32546 1  
<<< M2 >>  S7=0 θ power OFF
08/21/2006 16:37:03 crate=3 1 0 1 0 0 0 1 223 1
08/21/2006 16:38:06 1 7.9400 72.6600 | 8.0200 81.5600 | 7.9400 0.0000
08/21/2006 16:38:06 2 6.9600 43.6300 | 6.9400 42.8100 | 7.8900 0.0000
08/21/2006 16:38:06 3 7.9900 35.3100 | 7.9900 47.1000 | 7.9600 0.0000
08/21/2006 16:38:06 4 8.0100 72.9700 | 7.9700 81.6400 | 7.5500 0.0000
08/21/2006 16:38:06 5 6.9600 42.9500 | 6.9300 43.8000 | 7.5600 0.0000
08/21/2006 16:38:06 6 8.0200 35.4700 | 8.0100 22.4500 | 7.5400 0.0000
08/21/2006 16:38:06 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:38:06 crate=2 1 0 1 0 0 0 1 -32545 1
08/21/2006 16:38:06 crate=3 1 0 1 0 0 0 1 223 1
08/21/2

PCMB logging file

===================================================
crate#1
08/21/2006 16:35:55
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:35:55
3.3V_1 3.37 3.37 3.34 3.34 3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33 3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:35:55
3.3V_1 3.38 3.38 3.36 3.31 3.33 3.34 3.38 3.36 3.34
3.3V_2 3.35 3.37 3.34 3.37 3.35 3.34 3.35 3.33 3.36
5.0V 5.07 5.10 5.02 5.06 5.09 5.07 5.05 5.03 5.10
==== 3.37 5.06 3.37 5.09 1.57 1.55
crate#4
08/21/2006 16:35:55
3.3V_1 3.38 3.34 3.33 3.35 3.30 3.32 3.33 3.38 3.39
3.3V_2 3.33 3.33 3.37 3.32 3.34 3.33 3.33 3.36 3.36
5.0V 5.06 5.10 5.08 5.06 5.05 5.13 5.07 5.06 5.04
==== 3.32 5.07 3.35 5.06 1.55 1.55
===================================================
crate#1
08/21/2006 16:36:35
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:36:35
3.3V_1 3.37 3.37 3.34 3.34 3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33 3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:36:35
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
crate#4
08/21/2006 16:36:35
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
===================================================
crate#1
08/21/2006 16:37:15
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:37:15
3.3V_1 3.37 3.37 3.34 3.34 3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33 3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.55 1.55
crate#3
08/21/2006 16:37:15
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.26 3.38 3.36 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 3.34 3.35 3.33 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
crate#4
08/21/2006 16:37:15
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.40 3.32 3.38 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 3.32 3.33 3.36 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
===================================================
crate#1
08/21/2006 16:37:55
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:37:55
3.3V_1 3.37 3.37 3.34 3.34 3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33 3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:37:55
3.3V_1 3.38 3.38 3.36 3.31 3.33 3.34 3.38 3.36 3.34
3.3V_2 3.35 3.37 3.33 3.37 3.35 3.34 3.35 3.33 3.36
5.0V 5.07 5.10 5.02 5.06 5.09 5.07 5.05 5.03 5.10
==== 3.37 5.06 3.36 5.09 1.57 1.55
crate#4
08/21/2006 16:37:55
3.3V_1 3.38 3.34 3.33 3.35 3.30 3.32 3.33 3.38 3.39
3.3V_2 3.33 3.33 3.37 3.32 3.34 3.33 3.33 3.35 3.36
5.0V 5.06 5.10 5.08 5.06 5.05 5.13 5.07 5.06 5.04
==== 3.32 5.07 3.35 5.06 1.55 1.55

 

Captured a type 1 glitch in M2 and also saw it with the PCMB log file.  Note that the status words say the PS is OFF, not that it has tripped. 

 

Curiously there is no mention of this event in the ELOG.  And the current read backs before and after show no difference.  So we do not know how this affected the data taking.

 

22-Aug  T

 

22/8  11:19:17

 

Have not observed any glitches for today.

The Maraton file format has been modified by Valeri. The new format greatly facilitates moving the files into Excel and analyzing them. I am also able to analyze the PCMB files rapidly as well.

I have moved copies of the XLS files to the web directory where everyone should be able to view them.

https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS

Or go to the I&C web page:
http://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/index.htm

and click on 'LVPS Information'

 

 

22/8  18:04:41

 

Seen often today...ALCTs go into bad state...LV powercycling cures the problem....

ALCT: Slow Control chip ID = 8 version b: day = 7, month = 9, year = 2001
ALCT: Fast Control chip ID = f version f: day = ff, month = ff, year = ffffffff

 

These ALCT problems could be caused by the LVPS.  At this time we do not know what the cause is and this type of problem will need to be watched wrt the LVPS as well as for other reasons.  

 

23-Aug  W

No trips, data looks OK.  On from 09:00 to about 18:00

 

23/8  21:04:30

 

The LVPS log files were analyzed through 17h30. No problems or glitches were seen in either the Maraton log file or the PCMB log files. The latest Maraton charts and tables are in the web directory:

https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS

file: maraton_data_pvss_08_23_2006.xls

 

24-Aug  T

No trips, data looks OK.  On from 09:00 to about 15:00

 

24/8  15:22:01

 

Just looked at LVPS log files for today. No problems.

The following tables are from the Maraton log file. The first row is the minimum value read back during the day, the second row is the maximum value read back. Non of the voltage nor current readings were 0.0 and the status bits are unchanged.

Maraton 1 >> ME+1
Minimum 7.94 61.77 6.96 34.37 7.99 20.70 8.01 72.49 6.93 39.47 8.01 20.86 1 0 1 0 0 0 1 223 1
Maximum 7.96 73.11 6.99 43.95 8.01 36.43 8.01 72.97 6.96 43.42 8.02 35.79 1 0 1 0 0 0 1 223 1
Difference >>>> *** *** *** *** ***

Date Time M1V1 M1A1 M1V2 M1A2 M1V3 M1A3 M1V4 M1A4 M1V5 M1A5 M1V6 M1A6 S1 S2 S3 S4 S5 S6 S7 S8 S9

Maraton 2 >> ME+2 & ME+3
Minimum 8.00 81.33 6.94 42.74 7.99 22.90 7.97 81.40 6.87 42.75 8.01 22.45 1 0 1 0 0 0 1 -32545 1
Maximum 8.02 82.25 6.94 44.00 7.99 47.75 7.97 82.12 7.00 43.65 8.01 46.55 1 0 1 0 0 0 1 -32545 1
Difference >>>> *** ***

Date Time M1V1 M1A1 M1V2 M1A2 M1V3 M1A3 M1V4 M1A4 M1V5 M1A5 M1V6 M1A6 S1 S2 S3 S4 S5 S6 S7 S8 S9


The following table is the summary for the PCMB read back log file. First the expected values for minimum and maximum are given, and then the actual values from the log file. Note that there are no 0's here either.

Expected Values
minimum 3.33 3.32 3.31 1.54 1.54 3.33 3.31 3.3
maximum 5.1 5.1 5.09 5.09 5.13 5.08 5.07 5.1

Values from Data
minimum 3.33 3.32 3.31 1.54 1.54 3.33 3.31 3.30
maximum 5.10 5.10 5.09 5.09 5.13 5.08 5.07 5.10

The complete Maraton SS has been moved to the web page.

 

25-Aug  F                                                                                         

No trips, data looks OK

 

25/8  10:02:01

 

Problem with turbines...LV is off....

 

The power for SX5 and GB tripped at about 7am.  The welder did not disable the file alarm.  When the power came back, an AC breaker in SX5 did not and the turbines in our Maraton racks did not turn on as we thought.  We ran about 17 minutes without cooling.

 

26-Aug  S

Power cycle at 08:05:26

 

26/8  21:42:08

 

LV + HV turned off for 1/1/32 in order to reduce the load on maraton 1.

Currently experiencing problems that seem to be peripheral crate based but exhibiting themselves in corrupted data @ fed crate.

Magnet is ramping down to 3T for next hour or so to give subdetectors time to cool down. (multiple overheat probs)

 

CMS was experiencing problems with the copper water cooling circuits.  Other parts of the detector were greatly affected.  It may have been affecting us.  We know that if the PC CRB over heats the voltage output to the crate slowly drops.  This could be what we were seeing.

 

27-Aug  S

both supplies off 09h36

after re-initialize M1A4 has moved up to 76A from 63 amps since day before and M1A6 moved up to 33.55A from 30.66A since the day before.

 

28-Aug  M

No problems all day until:

 

28/8  13:28:36

 

Maraton1 tripped !!!!!!!!!! Since long now...... :(

 

 

13:18:53         Supply M1 trips off. 

 

Time

 

M1V1

M1A1

M1V2

M1A2

M1V3

M1A3

13:17:33

1.00

7.94

72.66

6.96

43.16

7.99

36.11

13:18:13

1.00

7.94

72.66

6.96

43.16

7.99

36.11

13:18:53

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:19:33

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:20:13

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:20:53

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:21:33

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:22:13

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:22:53

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:23:33

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:24:13

1.00

7.94

72.66

7.40

56.38

7.99

36.11

13:25:47

1.00

7.94

72.66

6.96

43.00

7.99

36.43

13:26:27

1.00

7.94

72.66

6.96

43.00

7.99

36.27

13:27:07

1.00

7.94

72.66

6.96

43.00

7.99

36.27

13:27:47

1.00

7.94

72.66

6.96

43.16

7.99

36.11

13:28:27

1.00

7.94

72.66

6.96

43.16

7.99

36.11

 

 

The table above has time, supply #, and then 3x(V, A) for supply M1_1.  At 13:18:53 the supply has tripped.  Note that both the voltage read back and the current read back have jumped up.  The voltage changes by just under ½ volt, and the current by some 13 amps.  At 13:18:53 the supply is tripped and the read back values remain constant until the supply is turned back on at 13:25:47.  After that the voltage and current return to the same values as before the trip. 

In the PCMB log file the read backs are nominal through the 13:17:57 read.  They then read 0’s from the 13:18:37 through the 13:24:37 read.  They are back on at the nominal values beginning with the 13:25:17 read.  This data is consistent with M1 tripping off.  There is no visible fluctuation in the read back values just prior to the trip.

The nine status words returned and placed in the log file are.

           

1

0

1

0

0

0

1

-32545

1

0

2

1

0

0

0

0

-32554

1

 

The table above has the 9 so called status words.  The first line is the read just before the trip, the second line just after

 

Meaning of Status words in the log file

 

1 GetNoErrors                       1 == no errors          

2 GetCurrentFlags               0 == no error flags    power supply switched off due to over current

3 GetACInLimit                     1 == in limit

4 GetOverVoltProtFlags      0 == no error flags    power supply switched off due to over voltage on terminal (OPV)

5 GetUnderVoltFlags          0 == no error flags

6 GetOverVoltFlags             0 == no error flags    power supply switched off due to over voltage on sense wire

7 GetPowerOn                      1 == power is on

8 GetCrateStatus                 2 bytes of status

9 GetTripIfAnyErrorEnable 1 == enable trip on any error

 

These are the nine status values given in the log file according to the Wiener OPC server notation.  

 

After the trip 4 of them are changed.

 

1 GetNoErrors                          1             0                no errors to errors

2 GetCurrentFlags                  0             2                no flags to flags

3 GetACInLimit                        1             1                no change

4 GetOverVoltProtFlags         0             0                no change

5 GetUnderVoltFlags             0             0                no change

6 GetOverVoltFlags                0             0                no change

7 GetPowerOn                         1             0                power from ON to OFF

8 GetCrateStatus                    -32545   -32554      2 bytes of status

9 GetTripIfAnyErrorEnable    1             1                no change

 

For word 2 GetCurrentFlags  the returned value is 2.  We do not know what extra information this 2 conveys beyond the fact that it is not a 0.  Perhaps it means that internal channel 2 is the one which tripped.  That would agree with the analog current data read back.                    

 

Status byte 1 and status byte 0 from the supply.  The values in the log file of:

 

before trip         -32545 = byte 1 byte 0 ==> 1101 1111

after trip             -32554 = byte 1 byte 0 ==> 1101 0110

 

Translate to:

For byte 0, bit 0 -> 0 == power OFF

For byte 0, bit 3 -> 0 == error in supply

 

21:38  CSC turned OFF, MTCC phase 1 finished.

 

Summary and Comments

 

The first problems we saw were related to trying to drive too much current out of the two supplies.  The low voltage from the 50A channels to the PC was running over 45A and peaking at 50 or more.  This caused several trips.  We were also drawing near or over the maximum current available from the PFC modules in the GB.  This caused them to trip off via their front panel breaker.  These problems were bypassed by reducing the load to each crate.  First 3 of the 9 board sets DMB/TMB/RAT in each of the crates were removed.  That is unplugged from the back plane.  This also turned off 3 of 9 CSC’s.  At this reduced load the total current draw was well below the limits.  Within a day 2 of the 3 sets had been replaced and the currents were within limits.

Then we found that M1 had been set up wrong, the over voltage trip level on channel 6 was set at exactly the same point as the voltage setting.  This caused some trips as seen in the log files.  This was fixed.

Then we began seeing the type 1 trip.  In this an entire supply goes to 0 output voltage and current for some 30 to 50 seconds and then returns to normal as if nothing had happened.  Our first indication of this was the fact that a crate of boards would spontaneously lose there initialization, just as if there had been a power cycle. 

To learn more about this we instituted an ad hoc logging to text files for the power supplies.  I say ad hoc because there are long range plans for logging within DCS, but it is not in place and we needed something.  The first version logged a single Maraton, M1.  Then all 3 were added, and then the format was changed to be more spread sheet friendly.  A text log file for the PCMB read back was also instituted. 

These log files told us first that the glitch seen before was indeed a very short power cycle of the supply.  The voltage and current output from the entire supply are all 0 for usually a single read.  The PCMB log file told us that this was real and not a read back problem.  It also showed 0 voltage at the same time stamp.  After seeing this only on M1, we then saw the same behavior on M2.  This last trip was seen on the 21st.  Since that time we have not seen this type 1 glitch.

We had an over current trip on the 28th just 10 minutes before the magnet itself tripped off.  This does not look like the PS was the cause.  It looks like the current in the PC jumped up and the supply tripped off as it was supposed to.  We do not know why the current went up, because on a restart the crate ran as normal with the same currents as before the trip.  Of course this could be a PS problem and that possibility cannot be ruled out.

 

We had a lot of problems with the Maraton LVPS, but no clear evidence that they are not suited for LHC running or adversely effected by the magnetic fields.  The problems stemmed from;

                                    1 – we over loaded the supplies,

                                    2 – we had problems with setting and operating the supplies,

                                    3 – at the start our logging of the supplies was non-existent and finaly

                                    4 – we did observe behavior that remains unexplained.

 

Our plans are:

1 – We have installed a 3rd Maraton which will supply power for all 4 PC.  This will allow us to power up the full 60 degree slice and have a good margin of power available on any single Maraton.

2 – We are now better at running the supplies.  But we NEED a Maraton specific manual for both the power supply and the PCF module.  We have to push to get these.

3 – The logging that was established has proven to be sufficient to understand what is going on.  This will remain in place and we have no plans for modifying it, but could if the need arises.  

            4 – There are a few things we can do here.

A – We can set the under current trips on the Maratons so that the type-1 glitch should trip off the supply.  If it doesn’t then we learn something also.

B – we want to study the behavior of the supplies over the next weeks before the restart of the MTCC.  Even to the point of deliberately causing trips.  Or in effect, reverse engineer the supplies to become more familiar with their behavior in extremes.