shards mvgfr: geek: 2008

2008/12/20

Gmail: Bad Request; Error 400

First time I've seen this: Tried to log in to gmail (via browser) and all I got was:

Bad Request
Error 400

I figured it was momentary, so went off to do other things. (So many other things to do...)

A few hours later, same thing. Next day, same thing.

This is unlike gmail. So tried a simple Google search... slow, but it did return.

Maybe Google is under a DoS attack? No news, and now Google is returning just fine, but still no gmail.

So I started searching around for anyone with a similar problem. Saw a few, though they were old.

Tried the gmail Google Group and went through what seemed to be the procedure to enter a new issue (after reviewing current open issues), but that did not give an opportunity, at the end, to open a new issue.

So, some more generic searching, using both Google and Yahoo, turned up the typical low-level support "decision tree" request to empty cache and delete cookies. Well, I'm certainly not going to delete all my cookies right away, however...

I tried a different browser and got right in - OK; we're on the right track.

I then saved my cookies (always have a backup) and looked for which cookies might be the culprit.

Got it in one: There were probably well over a hundred (!) separate cookies for the mail.google.com domain (both with and without a leading dot), that also used a path of "/mail". I deleted them, and gmail loaded right up on the next try!

Update 2009/03/20: Apparently there's a similar problem with Firefox.

2008/12/19

Browser password managers leak like sieves

We're browsing away, maybe checking our bank account balance and up pops the requirement to type in the password to unlock the password manager. Ah, we think, another indication that we're safe - it required me to type the password AND I was so careful to make sure it was the right site, etc.

Unfortunately, that's a false sense of security - which is worse than no security at all.

Click through the link above to read the gory detail if you like; even if you don't understand it all, at least some will make sense.

Even better, try the test yourself - watching it pull passwords out of YOUR very own password manager will really drive the point home.

I know of no actual exploits yet, and they do seem to require a compromise of the site from which the attacker wants to steal your password - however, as we've seen, such compromises are not at all uncommon.

One other item of note: For Safari in particular, I note that the default, when creating a new entry, is to give Safari blanket permission (via Access Control). While convenient, it is far less safe - and it seems that the problems detailed in the CIS article, might well be avoided if Safari did not do this; at least the user would be required to type in the keychain password each time and thus get some warning.

2008/11/26

"What To Look For in a CIO"

Economic constraints mean there's business out there that others are pulling back from, for the rest of us to pick up - opportunity!

What's an org to do? Get crazy - like a fox.

http://www.cio-today.com/story.xhtml?story_id=12200AVJWGKE

We have lots of tech already, and there are people who can do amazing things with it, even before investing in anything new.

2008/11/04

Kerio 6.6 upgrade disastrous

On 11/1 I upgraded JDK's Kerio server from 6.5.2 to 6.6.

While I had a downtime window, I also applied the latest MOSX updates: ARD 3.2.2, QT7.5.5, Java 10.5 Upd2, MOSX 10.5.5, and SecUpd2008-007.

Extensive research beforehand indicated no potential issues with any of the above.

Those processes went smoothly with the exception of one boot that strangely resulted in securityd crashing and failing to restart. A simple shutdown & restart fixed it.

I did some testing and everything looked good, so opened it back up to users. Shortly after that, users reported they were unable to modify their own calendar events. It seems that KMS no longer sees them as the owner of the event; the event "Organizer".

LOTS of debugging resulted in Nate Herzog (http://isitcreative.blogspot.com) finding the key: KMS is now doing a case-sensitive compare; "john_smith@domain" is not allowed to modify an event that is assigned to the Organizer "John_Smith@domain". (Nate's done a great deal of work understanding calendaring in KMS.)

Further, merely modifying the file on the server that corresponds to the event, such that the case matched, allows a user to modify the event. (No other action is necessary; no reset of an index.fld file and no reboot.)

And according to our experience, the case of the Organizer may appears either way (since before the upgrades) so we can't just do a one-time edit to force it one way; the problem will simply reappear.

It's the compare operation that's the problem; somehow case-sensitivity was newly introduced to it.

After trying for most of a day to fix it or find workarounds (Kerio support has no responded yet), we gave up and downgraded to 6.5.2. The problem persisted.

We restored the boot volume to an image made just before upgrading. The problem persists.

The backup we restored includes the OS & KMS code (/usr/local/kerio) not the data; the mailstore is on a separate volume.

This makes very little sense; apparently there's something changed in the mailstore that carries this problem forward, since the problem did not exist before this weekend's upgrades.

And in the mean time, after 14+ hours straight out, calendaring is still horribly broken.

Today, as a sanity check, I'm going to restore the entire server (code & mailstore) to its pre-upgrade state, to another machine and see if the problem somehow magically persists there.

20081104-0918 update:

1) Overnight, our IS Director, Nate Herzog (http://isitcreative.blogspot.com) had an idea that seems so far to work: Modifying the aliases for each account to be all lowercase. Simply editing the case doesn't take (it seems there remains a case-INsensitive compare there); it's necessary to remove the entry and then add it back in, with all lowercase. Editing the users.cfg file directly may also work, though we didn't want to take the server offline.

2) There's some chatter about Kerio releasing a patch to V6.6.

2008/10/30

Safari (or other WebKit) browser fails on secondary/internal pages

This is a a REALLY weird one, that I've been tearing my hair out, for several months:

You open a website and it loads fine; click on an internal link (a sub-page, search, whatever) and it fails - with a WebKit browser like Safari.

It's not a connectivity problem, though that's what it looks like at first.

As a matter of fact, browsers base on a non-WebKit rendering engine (ex: Firefox or Camino) are fine.

Safari gets the SYN/ACK from the target webserver and then *nothing* more from it; Safari just keeps resending the http request.

Wireshark shows "[TCP retransmission] [TCP segment of a reassembled PDU]" for the packets Safari sends after the Mac ACKs the webserver's SYN/ACK. This sends you looking for things like MTU (Maximum Transmission Unit), PMTU (Path MTU) and MSS (Maximum Segment Size) though it won't help.

My IS Director, Nate Herzog

http://isitcreative.blogspot.com/

just found the solution on MacInTouch

http://www.macintouch.com/readerreports/safari3/topic4614.html

and I found some more chatter about it, using the search terms from that article, such as Ed Marczak's

http://www.radiotope.com/content/safari-and-sonicwall

I'm blogging here, with a few more keywords, in the hope it'll save someone else some hair-tearing! :)

2008/04/02

MOSX: video mode wedged?

OK; this finally bit me one time too many - took me way too long to figure it out, but here it is, since I don't see it anywhere else:

For some reason, the video mode gets stuck (ex: 640 x 480 only) and you can't change it - here's how to fix it:

The info is stored in several files:

/Library/Preferences/.GlobalPreferences.plist
/Library/Preferences/com.apple.windowserver.plist
/Users/⟨short-name⟩/Library/Preferences/ByHost/.GlobalPreferences.⟨MAC-addr⟩.plist
/Users/⟨short-name⟩/Library/Preferences/ByHost/com.apple.preference.displays.⟨MAC-addr⟩.plist
/Users/⟨short-name⟩/Library/Preferences/ByHost/com.apple.windowserver.⟨MAC-addr⟩.plist

And there may be some in PRAM though that doesn't seem to be the case on recent systems.

To reset the video mode:

defaults delete /Library/Preferences/.GlobalPreferences ColorSyncDevices
defaults delete /Users/⟨short-name⟩/Library/Preferences/ByHost/.GlobalPreferences.⟨MAC-addr⟩ ColorSyncDevices

(Each of the above on its own line; the first requires admin.)

(Simply removing the windowserver plists has never done it for me - even from single-user mode, immediately followed by a restart and PRAM reset (multiple times). And FWIW, those files have still not reappeared on my system; perhaps they're no longer used.)

Why might this happen? It's probably related (at least in my case) to moving a boot volume from one system to another - that has a different video card. Which means this might be of some help for moving from one machine to another (ex: upgrade) or deploying an image to multiple machines.

NB: The above info is from a Mac OS X Server 10.4.11 system, though at least most of this should apply to Mac OS X (Client) as well, and other releases.

2008/03/19

Trouble joining a domain? (MSWin)

Strange; this has happened more than a few times now, so time to expose my ignorance:

We have a Citrix server running on Win2K3 Server.
It's configured to bind to a domain in order to allow use of the accounts there.
The PDC is on a Mac OS X Server box (10.4.11).

The initial bind is fine and authentication works great - until it breaks.

When it breaks, simply re-entering the info yields a 1326 error; it thinks the credentials are wrong. It seems there may be some caching of credentials, though I haven't found where or how to flush; rebooting the Citrix server doesn't help.

It also doesn't help to re-bind to a workgroup, reboot, and then attempt to re-bind to the domain - same error.

What does seem to fix it is this:

On the PDC, rename the domain & save.
On the domain client, confirm a bind to new/different domain.
Rename the domain back to the original. (On the PDC.)
Confirm a bind to the original domain. (On the domain client.)

Theory: Caching is forced to flush by temporarily binding to another domain - we've only got one, hence the rename; it's probably not necessary if you've another domain to temporarily bind to.

(Update: If it's an option in your situation, a simple restart of the PDC may suffice.)

2008/03/15

Xserve RAID disassembly

If you ever need to take apart an Xserve RAID, watch for the extra screw on the case bottom!

It's essential and not in the Service Manual I've got. (Hopefully it's in a newer version.)

When you get to just about the last step and you need to rotate the front half of the chassis up (say, for replacing the midplane board) take a break. Don't keep slamming it up, muttering "I'm following the directions!" :)

Pull the whole chassis forward a bit, over the edge of the table and remove the tiny screw that's front and center on the bottom of the chassis. Then pull the front of the front half of the chassis up and over the peg - simply sliding it forward does not seem to give enough room to get the midplane board clear.

And don't forget to put it back when re-assembling. :)

Leopard: Periodic scripts don't email results (postfix)

I've been round-and-round on this one and can't find the break; hopefully someone will jump in and point out my error!

I configure /etc/aliases so that postfix knows where to send stuff addressed to root. I use "mail root" to test and it works.

Here's the first strangeness: Postfix no longer logs the send, as it did by default in Tiger. And I can't find where to bump the logging level.

I then configure /etc/periodic.conf.local to send output of the periodic scripts to both root and /var/log. The scripts run (both interactively via "periodic daily" and timed, via launchd) and logs are written in /var/log, but no email.

The behavior is consistent across every Leopard machine I've configured.

So both pieces of the puzzle seem to be working properly, though not the combination - what's missing?

AHA! After a false start, simply trying to explain it, did indeed bring on the solution, and it's painfully easy: Update to 10.5.2. :)

I had forgotten two factors:

1) Running "periodic " interactively did indeed email the results properly.

2) One of my Leopard installs "strangely" did, when kicked off by launchd, email the periodics' results properly - the one updated to 10.5.2. :))

(Credit to an Apple Discussion for the final kick to put all the pieces together.)

Update #2: It's not quite so easy (as updating to 10.5.2); the default config on a Mac OS X *Server* box is somewhat different: Comment out the references to Cyrus in both master.cf & main.cf (in /etc/postfix); for those of us simply wanting to email cron/launchd results out, it's just getting in the way.

BTW: Increasing the postfix logging level is so simple it's embarrassing: Edit /etc/syslog.conf. :)

2008/03/07

Kerio 6.5.0 update/upgrade - WAIT

Wow; a 15-hour day yesterday, and I'm up at 4am still thinking about our failed attempt to upgrade from Kerio 6.4.2 to 6.5.0.

Though I woke up early, because I think I have a solution.

HOWEVER, this won't be an option for some shops, so if you're considering 6.5.0, please consider carefully, and have a rollback* plan.

It seems that there's a bug in the new 6.5.0 code (my own conjecture - as well as the two Kerio Support people I spoke with yesterday).

For us the manifestation was, as soon as everyone arrived and started logging in, the response for the users ranged from terribly slow to simply unable to log in. Users accessing via web were able to, after several minutes, log in and work almost normally. Users on Entourage (the vast majority) were frequently unable to log in at all, and when they were it took, I kid you not, hours, to sync - and the sync rarely completed before an error message. (The error messages were actually varied, with some that I haven't seen before, which were something to the effect that if you selected one of the choices, you'd be wiping mail out.)

A very painful day all around.

Back and forth with Kerio Support and nothing we tried was able to improve the situation to a noticeable degree.

However, after catching almost 5 whopping hours of sleep, I think I may have the answer.

First, a few words about our config, that are probably significant: Kerio is configured with accounts "imported" from an Open Directory server (LDAP on Mac OS X Server). Neither the OD server nor the Kerio server (also on Mac OS X Server) broke a sweat (CPU idle on both was in the 8X-9X% range, and there was plenty of free RAM; network bandwidth was fine too) during the festivities yesterday, but something was bogging down Kerio's queries of the OD server. Actually it appears that the queries and responses were fine and specifically, quick. However Kerio seemed to be getting bogged somewhere in its process around the queries. It also, for most of the day, reported almost every minute, that it had reached its limit of LDAP queries (32 simultaneous) - increasing that limit to 64 didn't help and 128 didn't either; something else was going on.

There does seem to be a correlation between the problem and lots of Entourage users, however I now think Entourage is a red herring; the issue that everyone missed so far is spam - specifically spam addressed to non-existent addresses. In which case, the message goes through to Kerio, Kerio hammers OD with (I believe) buggy code and gets bogged down for any operation, including Entourage which ends up collateral damage. (Not that Entourage is totally innocent. :)

Here's my theory for a fix: We have a Barracuda and can configure it to talk to OD directly. It's a standard anti-spam feature and it will hammer OD for exactly the same queries, only directly. Kerio will not only not bog, it'll be *faster* than it was before, since it'll never see any of that spam.

So far just a theory...

(*Speaking of rollback: Kerio support would only give me "no guarantees" and "your plan sounds good" (grumble) however it appears to be as simple as this: Shut the server down; run the upgrade installer to UNinstall; restore your mailserver.cfg file from before the upgrade; replace the prior mailserver.cfg file (BEFORE re-running the upgrade); re-run the upgrade - which auto-starts the server; check, test, confirm...)

2008/03/02

Login freezes in the Finder, at the Spotlight icon?

sudo bash -c 'rm -rf /System/Library/Caches/com.apple.ATS* /Library/Caches/com.apple.ATS/* /System/Library/Caches/fontTablesAnnex; shutdown -r now'

# must be on a single line
# I just tweaked it a bit; credit to the Russian Bear :)

Enumerate "shares"

Handy to place into the periodic/daily script, to keep track of shares and/or similar sub-folders. (For example, you can watch how usage changes over time, to notice patterns of usage before they become a problem.)

# display usage of sharepoints
# (whereis enables it to be run on non-MOSXS system w/o err)
# ((anyone know of a call available on MOSX (client) system?))
if [ `whereis sharing` ] ; then sharing -l | grep path | awk '{print $2}' | xargs -n1 du -ks; fi

# OR

# show usage of dirs in /shares - if it and they exist
if [ -d /shares ] ; then if [ "`find /shares -type d -mindepth 1 -maxdepth 1`" ] ; then { find /shares -type d -mindepth 1 -maxdepth 1 -print0 | xargs -0n1 du -ks; } ; fi; fi

(NB: Both are single lines and must be run as root.)

The nice thing about the second variation is: If you config your servers so that all "shares" are in the "/shares" directory (or maybe the "shares" directory at the root of each mounted volume), it shows the status even if they're not shared at the moment, which can be handy.

2008/01/08

PX502 and Retrospect (update)

(Update to previous post.)

I now have it working; here are the changes:

New replacement PX502 from Quantum.
Directly connected (via FC) to Retro CPU.
(And therefore using only a single drive, since this CPU has only 2 FC interfaces. :()

Points to Quantum for (FINALLY) sending the full, new replacement unit instead of continuing to "nickel & dime" parts.

Points to EMC Insignia for suggesting a direct connect.

Serious points taken away from EMC Insignia for not working via a FC switch! :(

More points taken away from EMC Insignia for not adequately testing the PX502 - contrary to explicit support ("Qualified - Passed extensive in-house certification. Storage device is fully supported.") on their website.

More points taken away from EMC Insignia for saying they finally got a PX502 to test (!) and stringing me along on the results - still haven't heard and they last told me they'd give me an update a few weeks ago. :(

More points taken away from EMC Insignia for trumpetting that they'll be at MacWorld - talking about the moribund V6.1. I just hope and pray that's a smokescreen for the REALLY LONG overdue next version. We'll see soon enough...

shards mvgfr: geek