Introduction
The initial or 1st
phase of the MTCC has completed. During
this phase the magnet was ramped several times to different flat top values up
to the full 4 Tesla. While the magnet
was ramping and running at constant current many global, local and calibration
runs were taken by the CSC. For this
exercise the CSC LV was supplied by two Weiner Maraton supplies, which are
magnet field tolerant. An important
aspect of this running was to understand the performance of the LVPS in the
magnetic fringe field.
There were a number of start
up problems that resulted in PS trips and unexplained behavior. Steps were taken to monitor and log the
available monitor data from each supply and that available from the Peripheral
Crate Monitor Boards, PCMB, installed in each P crate. This data was crucial in understand and
observing the behavior of the supplies.
History
Under the original plans the
2 Maraton supplies were to supply the
Details
Two Wiener Maraton LVPS were
used to power the CSC LV and the PC LV during the 1st phase of the
MTCC. One LVPS,
Maraton #1 or M1, powered station ME+1.
The second LVPS, Maraton #2 or M2 powered stations M+2 and ME+3. Because of the geometry each supply had
identical loads and powered 9 CSC and 1 PC.
The channels assignments were as follows:
Supply _1 First half _2 Second half |
Maraton Channel Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 |
Output current 150A 50A 100A 150A 50A 100A |
Load Digital_1 PC_1 Analog_1
Digital_2 PC_2 Analog_2 |
Maraton #1 |
Voltage
monitor M1V1 M1V2 M1V3 M1V4 M1V5 M1V6 |
Current
monitor M1A1 M1A2 M1A3 M1A4 M1A5 M1A6 |
Maraton #2 |
Voltage
monitor M2V1 M2V2 M2V3 M2V4 M2V5 M2V6 |
Current
monitor M2A1 M2A2 M2A3 M2A4 M2A5 M2A6 |
Table 1. Output channel
assignments for the two supplies.
M1_1 |
ME+1/11 |
M1_2 |
ME+1/10 |
M2_1 |
ME+2/5 |
M2_2 |
ME+3/5 |
Table 2. Trigger sector assignments for the
two supplies.
Analysis Files
The analysis results from
the log files are available on the web at the directory listed below. The analysis was done within XLS files. One file for each calendar
day.
Web Directory for results:
https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS
The XLS files for the
Maraton log files analysis are named for example as:
maraton_data_pvss_08_28_2006.xls
where
the number is the month, date, and year.
The XLS files for the PCMB
log files analyzed are name for example as:
pcmb_data_pvss_08_22_2006.xls
Analysis and Timeline
Date Magnet Status Monitor
Status LVPS Status Comments
03-Aug none M1 trip first recorded
trip
04-Aug Trips
on M1 & M2 the start of a
period of many trips
14-Aug M
15-Aug T Maraton
1 log files
16-Aug W Notice
and correct over voltage set point
17-Aug T ON 06:00 to 14:00 to 17,500 M1 Type 1 glitch
18-Aug F 3
Maraton log files + PCMB log files
19-Aug S no
trips
20-Aug S no
trips
21-Aug M M2
Type 1 glitch
22-Aug T Maraton
new format with status
23-Aug W no
trips
24-Aug T ON 07:00 to 11:30 no
trips
25-Aug F no
trips
26-Aug S On 07:00 may
be problems
27-Aug S On all day both
supplies off 09h36
28-Aug M magnet trip at 13h35 M1A2 trips at 13h18
Day by
Day details
The following section includes data extracted from
the ELOG. These files were posted by
several people. .
3-Aug
First recorded trip in ELOG
Marathon01 tripped while smooth running. Only |
4-7 Aug
Multiple
trips from 4th through the 7th, of both Maratons. Now we see
the PFC module tripping also. The only
indication we have for this is lost communication with the supply monitoring,
and the green light on the toggle switch on the front of the PFC is OFF.
Maraton01 tripped again while smooth running |
|||||
|
|
|
|
|
|
Maraton1 tripped while init and it did again |
|||||
|
|
|
|
|
|
Maraton02 tripped again while ramping up system |
|||||
Maraton2 tripped while downloading firmware |
|||||
the maraton unit of ye+1 tripped as well |
|||||
returned from lunch, finding the ye+2 ac/dc |
|||||
third trip of AC-DC |
|||||
we've seen a second trip of the YE+2 AC-DC |
|||||
apparently the AC-DC converter which powers |
|||||
Maraton2 tripped. |
|||||
Maraton2 tripped while init maraton1. |
|||||
Maraton1 tripped. No activity. |
|||||
In order to improve Maraton performance jumpers |
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10-Aug
This is an over current on
the PC backplane channel.
Maraton1 tripped. (channel2=53A) |
11-Aug
More trips.
maraton #1 trip, this time seemingly triggered |
|||||
4th trip ... we are considering to reconfigure |
|||||
3rd trip of maraton #1. |
|||||
repeated |
|||||
maraton channel #2, during data taking for |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
summary: |
|||||
2nd trip of maraton #1 ... no action or configured |
|||||
maraton #1 tripped again (no B-field this |
14 Aug
More trips and M3 is
installed. But this supply was not used.
trip #4 and #5 happened during lunch |
|||||
3rd trip. |
|||||
trigger synch run 2290(no csc daq data) got |
|||||
all LVs back up. preparing for a (non-daq |
|||||
all |
|||||
Maraton LV plans for today/tomorrow: |
|||||
2nd trip on maraton #1 today. |
|||||
maraton #1 tripped within 10min after cold |
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15-Aug T
15/8 18:04:17
repaeated maraton #2 trips
within minutes after power-up. as of now we consider
this unrecoverable and leave the unit out. DO NOT SWITCH IT ON ANYMORE until
maraton #3 is in place to take over the peripheral crate channels.
at the end of the day we will prepare the CSCs such that stations me+2 and me+3 can be used again.
15/8
23:31:09
as recorded by Valery's
DCS, a log file showing more details some of today's maraton trips.
columns: data, time, channel number, voltage, current
08/15/2006 15:03:07
1 7.9400 81.3300
08/15/2006 15:03:07
2 6.9600 42.8400
08/15/2006 15:03:07
3 7.9900 22.9500
08/15/2006 15:03:07
4 8.0100 81.8200
08/15/2006 15:03:07
5 6.9300 43.1800
08/15/2006 15:03:07
6 8.0200 22.6300
08/15/2006 15:03:29 1
8.1200 49.5500
08/15/2006 15:03:29 2
1.5400 0.0000
08/15/2006 15:03:29 3
8.0200 4.3300
08/15/2006 15:03:29 4
8.1600 0.4700
08/15/2006 15:03:29 5
1.8100 0.0000
08/15/2006 15:03:29 6
8.0400 0.0000
08/15/2006 15:03:29 1
8.1200 49.5500
08/15/2006 15:03:29 2
1.5400 0.0000
08/15/2006 15:03:29 3
8.0200 4.3300
08/15/2006 15:03:29 4
8.1600 0.4700
08/15/2006 15:03:29 5
1.8100 0.0000
08/15/2006 15:03:29 6
8.0400 0.0000
08/15/2006 15:04:05
1 7.9400 81.3300
08/15/2006 15:04:05
2 6.9600 43.1600
08/15/2006 15:04:05
3 7.9900 22.9500
08/15/2006 15:04:05
4 8.0100 57.9000
08/15/2006 15:04:05
5 6.9500 42.7900
08/15/2006 15:04:05
6 8.0100 15.0800
This looks like
a real trip because the voltages have not gone to 0 and the currents have
dropped but are not identically 0.
Curiously though, it is only off for one read, at most 58 seconds. Why did it go back on? This could have been a quick power cycle by
the operator, but no mention of it is found in the log.
There is another
trip at 15:20:45 that lasts until 15:28:45.
We do not have the status bits but it looks like a trip because the
analog read backs all freeze during this time. The current read backs usually fluctuate by a
few LE bits but for a trip the exact same value is read until the supply is
reset.
There are other
trips at 15:33, 15:41 and 15:46 at which point the supply is left off. Again these look like actual trips because
most of the values drop and all are frozen.
It should be
noted that the voltage for M1V6 is normally at 8.01 or 8.02. On the trip in the table above this voltage
value is 8.04, and for ALL of the other trips mentioned above the voltage value
is 8.06. This
looks like strong evidence for what we found with Urs the next day. Please see the 16-Aug entry just below.
16-Aug W
16/8 21:39:19
Urs
came to SX5 this afternoon and he, Petr and I (Fred) worked on the Maraton LVPS.
We disconnected the CANbus from DCS and connected directly to the supplies via
a utility program on laptop. With this program we determined that the control
parameters for Maraton 1 (the one that was tripping) were different than those
for Maraton 2. Maraton 1 was tripped OFF before we started so we read the error
bytes and determined that channel #7 caused the trip. Channel #7 on the
internal setup corresponds to channel #6 on the DCS page, the analog voltage
for the second half of the supply. We also noted that the overvoltage trip for this channel was at 8.00V and the voltage set
point was also at 8.00V. For none of the other channels on this
supply nor the other supply was the overvoltage
trip set so close to the voltage set point. In fact the other 3 similiar channels on both supplies were each set at 8.40V.
Urs theorized that this channel could run with the overvoltage
set so close, but any changes in current could move the voltage output and
cause a trip. We set this channel's overvoltage trip
to 8.40V as well. We also looked at all the other parameters on both supplies
and found them OK. We also spent some tome looking at Maraton #3 in case it
becomes necessary to use it as well. The parameters were checked, and it was
run and the output voltage values calibrated. It should be ready to be used.
All we can do now is run with Maraton #1 and see if there are further trips.
Based its future performance we can decide what to do next.
From this point
on we set the over voltage trip well above the voltage setting for each
channel.
17-Aug T
17/8
09:53:43
Maraton1
tripped.
17/8 12:30:59
we have observed both me1a and me1b crates to loose their
settings, usually a tell-tale of a
17/8 13:19:52
Just
before noon Maraton #1 get 30 second LV power glitch at 08/17/2006 12:48:15.
Please see attacment?
+YE1 Maraton#1 glitch
Date Time Ch Volt Currant
17/8 13:57:33
reconfiguring the me1 crates (daq
and trigger readout remains disabled, see dan's
earlier entry) ... waiting for another glitch. Fred** will analyze the maraton
slow controls log files to look for any obvious glitches (he observed those in
a previous analysis in which one of the channels dropped to ~3V for a few
minutes)
... i added the "
**
See report on 15-Aug
This is the
first capture of a type 1 glitch.
During this day
we see three instances of this glitch, at 10:15, 12:28 and at 14:18. We have no status bit information. All three times many of the current read backs jump up for a few reads, and then
we get one read where all values read back are exactly 0.
18-Aug F
18/8 21:45:54
Since
last evening Valeri has had logging for all three Maraton supplies working.
They were turned on at about 9:30 this morning by Martin and turned off at
19:40 this evening. The analysis of the log files are complete and show no
trips or other anomolous behavior during the day.
19-Aug S
19/8 07:59:06
Turned on Maraton 1, 2, 3. checked
that turbines were running in PS and PC racks (4). Checked that supplies were
on in DCS, and checked that the log files were getting fresh data showing
supplies on. I also checked that the PCMB log file was getting fresh data and
the values were OK for a PC that is on. Armando will oversee the PS today.
20-Aug S
20/8 08:12:41
Turned on Maraton #1 and #2. Checked with the DCS pannels, everything is OK. Visually inspected all four
crates, OK, Turbines are on. Data logging of Maraton and PCMB both show supplies
are on and look nominal.
Note that I did not log that I turned the supplies off Saturday. After turning
them on Saturday morning I logged off the DCS machine instead of locking it. It
has been a while since I last used the puter and
logged out by habbit instead of locking it. This
killed the panels and Armando could not use them all day. We talked in the
evening and I called Valeri, who went to the GB and re-established all the DCS pannels. I just locked and unlocked the screen, everything
works when you do it write.
21-Aug M
21/8 18:50:42
Discovered
a glitch on Maraton #2 (YES 2!!!)at about 16:37.
We see that all voltages and current read backs from the Maraton are
identically 0 for exactly one read.
16:35:56 M2 OK
16:37:03 M2 all 0's
16;38:06 M2 OK
We also see that the voltges to PC 3 and 4 (supplied
by Maraton #2) go to zero.
16:35:55 all ok
16:36:35 PC 3 & 4 all 0's
16:37:15 Many 0's
16:37:55 OK
The concurrent 0's in both log files indicates this is a real glitch not an
artifact of the readback. The fact that this occurred
in Maraton #2 today and Maraton #1 on Friday, strongly indicates that this is a
systematic problem in the supplies, not a single faulty supply. The fact that
no problems were seen on Saturday or Sunday hints that this problem may be
correlated to 'other' activity in the system, in the GB ...
Copies of the two logging files follows.
Maraton logging file.
08/21/2006 16:35:56 1 7.9400 72.6600 | 8.0000 79.0600 | 7.9400 0.0000
08/21/2006 16:35:56 2 6.9600 43.3200 | 6.9400 43.6000 | 7.8900 0.0000
08/21/2006 16:35:56 3 7.9900 35.3100 | 7.9900 46.9700 | 7.9600 0.0000
08/21/2006 16:35:56 4 8.0100 72.9700 | 7.9700 82.1200 | 7.5500 0.0000
08/21/2006 16:35:56 5 6.9600 42.9500 | 6.9500 43.6500 | 7.5600 0.0000
08/21/2006 16:35:56 6 8.0200 35.4700 | 8.0100 46.2500 | 7.5400 0.0000
08/21/2006 16:35:56 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:35:56 crate=2 1 0 1 0 0 0 1 -32545 1
08/21/2006 16:35:56 crate=3 1 0 1 0 0 0 1 223 1
08/21/2006 16:37:03 1 7.9400 72.6600 | 0.0000 0.0000
| 7.9400 0.0000
08/21/2006 16:37:03 2 6.9600 43.3200 | 0.0000 0.0000
| 7.8900 0.0000
08/21/2006 16:37:03 3 7.9900 35.1500 | 0.0000 0.0000
| 7.9600 0.0000
08/21/2006 16:37:03 4 8.0100 72.9700 | 0.0000 0.0000
| 7.5500 0.0000
08/21/2006 16:37:03 5 6.9600 43.2600 | 0.0000 0.0000
| 7.5600 0.0000
08/21/2006 16:37:03 6 8.0200 35.4700 | 0.0000 0.0000
| 7.5400 0.0000
08/21/2006 16:37:03 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:37:03 crate=2 1 0 1 0 0 0 0 -32546 1 <<< M2
>> S7=0 θ power OFF
08/21/2006 16:37:03 crate=3 1 0 1 0 0 0 1 223 1
08/21/2006 16:38:06 1 7.9400 72.6600 | 8.0200 81.5600 | 7.9400 0.0000
08/21/2006 16:38:06 2 6.9600 43.6300 | 6.9400 42.8100 | 7.8900 0.0000
08/21/2006 16:38:06 3 7.9900 35.3100 | 7.9900 47.1000 | 7.9600 0.0000
08/21/2006 16:38:06 4 8.0100 72.9700 | 7.9700 81.6400 | 7.5500 0.0000
08/21/2006 16:38:06 5 6.9600 42.9500 | 6.9300 43.8000 | 7.5600 0.0000
08/21/2006 16:38:06 6 8.0200 35.4700 | 8.0100 22.4500 | 7.5400 0.0000
08/21/2006 16:38:06 crate=1 1 0 1 0 0 0 1 223 1
08/21/2006 16:38:06 crate=2 1 0 1 0 0 0 1 -32545 1
08/21/2006 16:38:06 crate=3 1 0 1 0 0 0 1 223 1
08/21/2
PCMB logging file
===================================================
crate#1
08/21/2006 16:35:55
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:35:55
3.3V_1 3.37 3.37 3.34 3.34
3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33
3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:35:55
3.3V_1 3.38 3.38 3.36 3.31 3.33 3.34 3.38 3.36 3.34
3.3V_2 3.35 3.37 3.34 3.37 3.35 3.34 3.35 3.33 3.36
5.0V 5.07 5.10 5.02 5.06 5.09 5.07 5.05 5.03 5.10
==== 3.37 5.06 3.37 5.09 1.57 1.55
crate#4
08/21/2006 16:35:55
3.3V_1 3.38 3.34 3.33 3.35 3.30 3.32 3.33 3.38 3.39
3.3V_2 3.33 3.33 3.37 3.32 3.34 3.33 3.33 3.36 3.36
5.0V 5.06 5.10 5.08 5.06 5.05 5.13 5.07 5.06 5.04
==== 3.32 5.07 3.35 5.06 1.55 1.55
===================================================
crate#1
08/21/2006 16:36:35
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:36:35
3.3V_1 3.37 3.37 3.34 3.34
3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33
3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:36:35
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
crate#4
08/21/2006 16:36:35
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
===================================================
crate#1
08/21/2006 16:37:15
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:37:15
3.3V_1 3.37 3.37 3.34 3.34
3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33
3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.55 1.55
crate#3
08/21/2006 16:37:15
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.26 3.38 3.36 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 3.34 3.35 3.33 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
crate#4
08/21/2006 16:37:15
3.3V_1 0.00 0.00 0.00 0.00 0.00 0.40 3.32 3.38 0.00
3.3V_2 0.00 0.00 0.00 0.00 0.00 3.32 3.33 3.36 0.00
5.0V 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
==== 0.00 0.00 0.00 0.00 0.00 0.00
===================================================
crate#1
08/21/2006 16:37:55
3.3V_1 3.34 3.34 3.32 3.35 3.31 3.34 3.34 3.35 3.37
3.3V_2 3.35 3.37 3.35 3.34 3.33 3.34 3.35 3.33 3.34
5.0V 5.03 5.07 5.10 5.07 5.09 5.07 5.07 5.06 5.08
==== 3.36 5.06 3.35 5.06 1.56 1.55
crate#2
08/21/2006 16:37:55
3.3V_1 3.37 3.37 3.34 3.34
3.33 3.36 3.34 3.31 3.30
3.3V_2 3.39 3.39 3.34 3.33 3.33
3.32 3.35 3.33 3.36
5.0V 5.09 5.07 5.05 5.04 5.06 5.10 5.08 5.07 5.05
==== 3.33 5.08 3.37 5.05 1.54 1.55
crate#3
08/21/2006 16:37:55
3.3V_1 3.38 3.38 3.36 3.31 3.33 3.34 3.38 3.36 3.34
3.3V_2 3.35 3.37 3.33 3.37 3.35 3.34 3.35 3.33 3.36
5.0V 5.07 5.10 5.02 5.06 5.09 5.07 5.05 5.03 5.10
==== 3.37 5.06 3.36 5.09 1.57 1.55
crate#4
08/21/2006 16:37:55
3.3V_1 3.38 3.34 3.33 3.35 3.30 3.32 3.33 3.38 3.39
3.3V_2 3.33 3.33 3.37 3.32 3.34 3.33 3.33 3.35 3.36
5.0V 5.06 5.10 5.08 5.06 5.05 5.13 5.07 5.06 5.04
==== 3.32 5.07 3.35 5.06 1.55 1.55
Captured a type
1 glitch in M2 and also saw it with the PCMB log file. Note that the status words say the PS is OFF,
not that it has tripped.
Curiously there
is no mention of this event in the ELOG.
And the current read backs before and after show no difference. So we do not know how this affected the data
taking.
22-Aug T
22/8 11:19:17
Have not observed any glitches for today.
The Maraton file format has been modified by Valeri. The new format greatly
facilitates moving the files into Excel and analyzing them. I am also able to
analyze the PCMB files rapidly as well.
I have moved copies of the XLS files to the web directory where everyone should
be able to view them.
https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS
Or go to the I&C web page:
http://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/index.htm
and click on 'LVPS Information'
22/8 18:04:41
Seen
often today...ALCTs go into bad state...
ALCT: Slow Control chip ID = 8 version b: day = 7, month = 9, year = 2001
ALCT: Fast Control chip ID = f version f: day = ff, month = ff, year = ffffffff
These ALCT
problems could be caused by the LVPS. At
this time we do not know what the cause is and this type of problem will need
to be watched wrt the LVPS as well as for other
reasons.
23-Aug W
No trips, data looks OK. On from 09:00 to about 18:00
23/8 21:04:30
The
LVPS log files were analyzed through 17h30. No problems or glitches were seen
in either the Maraton log file or the PCMB log files. The latest Maraton charts
and tables are in the web directory:
https://cms-emu-slicetest.web.cern.ch/cms-emu-slicetest/904/LVPS
file: maraton_data_pvss_08_23_2006.xls
24-Aug T
No trips, data looks OK. On from 09:00 to about 15:00
24/8 15:22:01
Just looked at LVPS log files for today. No problems.
The following tables are from the Maraton log file. The first row is the
minimum value read back during the day, the second row is the maximum value
read back. Non of the voltage nor current readings
were 0.0 and the status bits are unchanged.
Maraton 1 >> ME+1
Minimum 7.94 61.77 6.96 34.37 7.99 20.70 8.01 72.49 6.93 39.47 8.01 20.86 1 0 1
0 0 0 1 223 1
Maximum 7.96 73.11 6.99 43.95 8.01 36.43 8.01 72.97 6.96 43.42 8.02 35.79 1 0 1
0 0 0 1 223 1
Difference >>>> *** *** *** *** ***
Date Time M1V1 M1A1 M1V2 M1A2 M1V3 M1A3 M1V4 M1A4 M1V5 M1A5 M1V6 M1A6 S1 S2 S3
S4 S5 S6 S7 S8 S9
Maraton 2 >> ME+2 & ME+3
Minimum 8.00 81.33 6.94 42.74 7.99 22.90 7.97 81.40 6.87 42.75 8.01 22.45 1 0 1
0 0 0 1 -32545 1
Maximum 8.02 82.25 6.94 44.00 7.99 47.75 7.97 82.12 7.00 43.65 8.01 46.55 1 0 1
0 0 0 1 -32545 1
Difference >>>> *** ***
Date Time M1V1 M1A1 M1V2 M1A2 M1V3 M1A3 M1V4 M1A4 M1V5 M1A5 M1V6 M1A6 S1 S2 S3
S4 S5 S6 S7 S8 S9
The following table is the summary for the PCMB read back log file. First the
expected values for minimum and maximum are given, and then the actual values
from the log file. Note that there are no 0's here either.
Expected Values
minimum 3.33 3.32 3.31 1.54 1.54 3.33 3.31 3.3
maximum 5.1 5.1 5.09 5.09
5.13 5.08 5.07 5.1
Values from Data
minimum 3.33 3.32 3.31 1.54 1.54 3.33 3.31 3.30
maximum 5.10 5.10 5.09 5.09
5.13 5.08 5.07 5.10
The complete Maraton SS has been moved to the web
page.
25-Aug F
No trips, data looks OK
25/8 10:02:01
Problem
with turbines...
The power for
SX5 and GB tripped at about 7am. The
welder did not disable the file alarm.
When the power came back, an AC breaker in SX5 did not and the turbines
in our Maraton racks did not turn on as we thought. We ran about 17 minutes without cooling.
26-Aug S
Power cycle at 08:05:26
26/8 21:42:08
Currently experiencing problems that seem to be peripheral crate based but
exhibiting themselves in corrupted data @ fed crate.
Magnet is ramping down to 3T for next hour or so to give subdetectors
time to cool down. (multiple overheat probs)
CMS was
experiencing problems with the copper water cooling circuits. Other parts of the detector were greatly
affected. It may have been affecting
us. We know that if the PC CRB over
heats the voltage output to the crate slowly drops. This could be what we were seeing.
27-Aug S
both supplies off 09h36
after re-initialize M1A4 has moved up to 76A from 63 amps
since day before and M1A6 moved up to 33.55A from 30.66A since the day before.
28-Aug M
No problems all day until:
28/8 13:28:36
Maraton1
tripped !!!!!!!!!! Since long
now...... :(
13:18:53 Supply M1 trips off.
Time |
|
M1V1 |
M1A1 |
M1V2 |
M1A2 |
M1V3 |
M1A3 |
13:17:33 |
1.00 |
7.94 |
72.66 |
6.96 |
43.16 |
7.99 |
36.11 |
13:18:13 |
1.00 |
7.94 |
72.66 |
6.96 |
43.16 |
7.99 |
36.11 |
13:18:53 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:19:33 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:20:13 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:20:53 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:21:33 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:22:13 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:22:53 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:23:33 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:24:13 |
1.00 |
7.94 |
72.66 |
7.40 |
56.38 |
7.99 |
36.11 |
13:25:47 |
1.00 |
7.94 |
72.66 |
6.96 |
43.00 |
7.99 |
36.43 |
13:26:27 |
1.00 |
7.94 |
72.66 |
6.96 |
43.00 |
7.99 |
36.27 |
13:27:07 |
1.00 |
7.94 |
72.66 |
6.96 |
43.00 |
7.99 |
36.27 |
13:27:47 |
1.00 |
7.94 |
72.66 |
6.96 |
43.16 |
7.99 |
36.11 |
13:28:27 |
1.00 |
7.94 |
72.66 |
6.96 |
43.16 |
7.99 |
36.11 |
The table above has time, supply #, and then 3x(V, A) for supply M1_1.
At 13:18:53
the supply has tripped. Note that both
the voltage read back and the current read back have jumped up. The voltage changes by just under ½ volt, and the current by some 13 amps. At 13:18:53 the supply is tripped and the
read back values remain constant until the supply is turned back on at 13:25:47. After that the voltage and current return to
the same values as before the trip.
In the PCMB log file the read backs are nominal
through the 13:17:57 read. They then
read 0s from the 13:18:37 through the 13:24:37 read. They are back on at the nominal values
beginning with the 13:25:17 read. This
data is consistent with M1 tripping off.
There is no visible fluctuation in the read back values just prior to
the trip.
The nine status words returned and placed in the log
file are.
1 |
0 |
1 |
0 |
0 |
0 |
1 |
-32545 |
1 |
0 |
2 |
1 |
0 |
0 |
0 |
0 |
-32554 |
1 |
The table above has the 9 so called
status words. The first line is the read
just before the trip, the second line just after
Meaning of Status words in the log file
1
GetNoErrors 1
== no errors
2
GetCurrentFlags 0
== no error flags power supply switched
off due to over current
3
GetACInLimit 1
== in limit
4
GetOverVoltProtFlags 0
== no error flags power supply switched
off due to over voltage on terminal (OPV)
5
GetUnderVoltFlags 0
== no error flags
6
GetOverVoltFlags 0
== no error flags power supply switched
off due to over voltage on sense wire
7
GetPowerOn 1
== power is on
8
GetCrateStatus 2 bytes of
status
9
GetTripIfAnyErrorEnable 1 == enable trip on any error
These are the nine status
values given in the log file according to the Wiener OPC server notation.
After the trip 4 of them
are changed.
1 GetNoErrors 1 0 no errors to errors
2 GetCurrentFlags 0 2 no flags to flags
3
GetACInLimit 1 1 no
change
4
GetOverVoltProtFlags 0 0 no
change
5
GetUnderVoltFlags 0 0 no
change
6
GetOverVoltFlags 0 0 no
change
7 GetPowerOn 1 0 power from ON to OFF
8 GetCrateStatus -32545 -32554 2
bytes of status
9
GetTripIfAnyErrorEnable 1 1 no change
For
word 2 GetCurrentFlags the returned value
is 2. We do not know what extra
information this 2 conveys beyond the fact that it is not a 0. Perhaps it means that internal channel 2 is the one which
tripped. That would agree with the
analog current data read back.
Status
byte 1 and status byte 0 from the supply. The values in the log file of:
before trip -32545
= byte 1 byte 0 ==> 1101 1111
after trip -32554
= byte 1 byte 0 ==> 1101 0110
Translate to:
For byte 0, bit 0 -> 0
== power OFF
For byte 0, bit 3 -> 0
== error in supply
21:38 CSC
turned OFF, MTCC phase 1 finished.
Summary
and Comments
The first problems we saw were related to trying to
drive too much current out of the two supplies.
The low voltage from the 50A channels to the PC was running over 45A and
peaking at 50 or more. This caused
several trips. We were also drawing near
or over the maximum current available from the PFC modules in the GB. This caused them to trip off via their front
panel breaker. These problems were
bypassed by reducing the load to each crate.
First 3 of the 9 board sets DMB/TMB/RAT in each of the crates were
removed. That is unplugged from the back
plane. This also turned off 3 of 9
CSCs. At this reduced load the total
current draw was well below the limits.
Within a day 2 of the 3 sets had been replaced and the currents were
within limits.
Then we found that M1 had been set up wrong, the over
voltage trip level on channel 6 was set at exactly the same point as the
voltage setting. This caused some trips
as seen in the log files. This was
fixed.
Then we began seeing the type 1 trip. In this an entire supply goes to 0 output
voltage and current for some 30 to 50 seconds and then returns to normal as if
nothing had happened. Our first
indication of this was the fact that a crate of boards would spontaneously lose
there initialization, just as if there had been a power cycle.
To learn more about this we instituted an ad hoc
logging to text files for the power supplies.
I say ad hoc because there are long range plans for logging within DCS,
but it is not in place and we needed something.
The first version logged a single Maraton, M1. Then all 3 were added, and then the format
was changed to be more spread sheet friendly.
A text log file for the PCMB read back was also instituted.
These log files told us first that the glitch seen
before was indeed a very short power cycle of the supply. The voltage and current output from the
entire supply are all 0 for usually a single read. The PCMB log file told us that this was real
and not a read back problem. It also
showed 0 voltage at the same time stamp. After seeing this only on M1, we then saw the
same behavior on M2. This last trip was
seen on the 21st. Since that
time we have not seen this type 1 glitch.
We had an over current trip on the 28th
just 10 minutes before the magnet itself tripped off. This does not look like the PS was the
cause. It looks like the current in the
PC jumped up and the supply tripped off as it was supposed to. We do not know why the current went up, because
on a restart the crate ran as normal with the same currents as before the
trip. Of course this could be a PS
problem and that possibility cannot be ruled out.
We had a lot of problems with the Maraton LVPS, but
no clear evidence that they are not suited for LHC running or adversely effected by the magnetic fields. The problems stemmed from;
1 we
over loaded the supplies,
2 we
had problems with setting and operating the supplies,
3 at
the start our logging of the supplies was non-existent and finaly
4
we did observe behavior that remains unexplained.
Our plans are:
1 We have installed a 3rd Maraton which
will supply power for all 4 PC. This
will allow us to power up the full 60 degree slice and have a good margin of
power available on any single Maraton.
2 We are now better at running the supplies. But we NEED
a Maraton specific manual for both the power supply and the PCF module. We have to push to get these.
3 The logging that was established has proven to be
sufficient to understand what is going on.
This will remain in place and we have no plans for modifying it, but
could if the need arises.
4 There
are a few things we can do here.
A We can set the under current trips on the
Maratons so that the type-1 glitch should trip off the supply. If it doesnt then we learn something also.
B we want to study the behavior
of the supplies over the next weeks before the restart of the MTCC. Even to the point of deliberately causing
trips. Or in effect, reverse engineer
the supplies to become more familiar with their behavior in extremes.