Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-9345 Deduplicate noisy logs #4564

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

benjirewis
Copy link
Member

@benjirewis benjirewis commented Nov 18, 2024

RSDK-9345

Main changes:

  • Stops outputting logs from the same sublogger that have already been output 3 times in the past 10 seconds
  • Outputs an "aggregation" log for any logs deemed noisy (see above point) at the end of 10 seconds before another log is output
  • Parses new disable_log_deduplication JSON flag to potentially disable the logic described in the above two points

Testing changes:

  • Small changes to existing logging tests
  • Adds tests for log entry hashkeying
  • Adds tests for log entry hashkeying

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 18, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 19, 2024
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 12, 2024
@benjirewis benjirewis changed the title POC dedupe-logs RSDK-9345 Deduplicate noisy logs Dec 12, 2024
@benjirewis benjirewis force-pushed the dedupe-logs branch 2 times, most recently from d55ea23 to afe9b2e Compare December 23, 2024 19:37
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 23, 2024
logging/impl.go Show resolved Hide resolved
assertLogMatches(t, notStdout,
`2023-10-30T13:19:45.806Z INFO impl logging/impl_test.go:132 foo {"key":"value"}`)

// TODO(benji): Add the following assertions to test more deduplication logic.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on testing a bit, but thought I'd throw this into review before the holiday.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 23, 2024
@benjirewis benjirewis marked this pull request as ready for review December 23, 2024 21:27
logging/impl.go Outdated
// `recentMessageEntries`, i.e. `LogEntry`s that `String`ify identically should
// be treated as identical with respect to noisiness and deduplication. Should
// not be used for actual writing of the entry to an appender.
func (le *LogEntry) String() string {
Copy link
Member

@dgottlieb dgottlieb Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this is for computing the key "string" for recentMessageEntries. I don't think this needs any of the special formatting?

Two remarks:

  • If we're going to use this method for creating a special/unique key, let's not call it a String method. For all of the reasons you wrote above -- the consumer of this method requires some crucial properties (great job describing them!). A method named String generally is assumed to not be bound by such strict requirements.
  • But more preferably -- I'm a little concerned about the performance impact of this. More about the number of small allocations rather than CPU time. I'd recommend looking into the hashing functionality Go provides to create a unique identifier for a LogEntry.
    • I don't need to see a benchmark, but I think* Go makes it easy to do memory allocation benchmarks with the -benchmem flag. If that sounds like fun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting I had never heard of the maphash package before. Maybe we can talk offline; I implemented something with it, but that implementation has worse performance than what I have currently according to a benchmark I wrote 😆 . I assume I'm using the type a little incorrectly.

I'm Writeing strings that I would normally append to the string created by the String method here to a maphash.Hash that's created from a global seed initialized at runtime. I assume it's the creation of the strings that's causing same/worse performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @dgottlieb a bit offline. He and I had a similar idea of implementation for hashing, and it is indeed about the same in terms of allocations/op and nanos/op in terms of performance for a benchmark I wrote. Will stick with the method I have here but rename it to HashKey.

logging/impl.go Outdated
}

// Track entry in recentMessage maps.
stringifiedEntry := entry.String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth -- this can be computed before acquiring the lock. I don't expect that to a notable impact in the grand scheme of things. But that is a common technique in this type of scenario

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; moved above lock.


// DisableLogDeduplication controls whether to disable de-duplicating noisy
// logs; defaults to false and can be specified in robot config.
DisableLogDeduplication = atomic.Bool{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why we use this name for the config flag (we want this to be on for robots by default), but I'm curious if you gave any consideration and/or had a reason why this global shouldn't be EnableLogDeduplication/DeduplicateLogs? The two benefits I see:

  • Easier to read symbols in predicates that are in the affirmative
  • The "least surprising behavior" (to a programmer) is for deduplication to be off. I think it's going to be more common to add tests that do not want deduplication. And moreover forget* it's happening. And tests that do need deduplication will know to turn it on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that it's least surprising for dedupe to be off, but also can see it being better that the names are consistent. but no strong feelings on this from my side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused with the concern/request here. To clarify; the field in the API is called disable_log_deduplication (although pretty sure we're on the same page there.)

Were you thinking I should rename (and correspondingly invert) this global to be something like EnableLogDeduplication / DeduplicateLogs? Or, that I should do that and also turn off deduplication by default on robots (as you said, I thought we wanted that on)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify; the field in the API is called disable_log_deduplication (although pretty sure we're on the same page there.)

Yep, we're in agreement here. The API exposed to the user invoking the CLI program/configuring via app, we've agreed will see the phrase disable_log_deduplication.

Were you thinking I should rename (and correspondingly invert) this global to be something like EnableLogDeduplication / DeduplicateLogs?

Exactly -- just because the exposed configuration to the user is "disable" doesn't mean the variable tracking this state in our system must also be "disable".

Or, that I should do that and also turn off deduplication by default on robots (as you said, I thought we wanted that on)?

When starting a ./viam-server -- deduplication should be on by default. (and now that this forced my to make my proposal more concrete):

  • Parsing a json config from the cloud app should ensure that. I think this PRs change to configData does this properly
  • But perhaps the Config struct member for this should instead be named EnableLogDeduplication.
    • Resulting in a shuffle of logic here when converting proto -> Config.

Copy link
Member

@dgottlieb dgottlieb Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also -- I can't comment off-diff, but did we forget to copy these log duplication members between the configData <-> Config in the Config.[Un]MarshalJSON methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused with the concern/request here.

The "concern" part was the change to TestLoggingWithFields needing to disable log deduplication. I think in an ideal world, that test author would not need to even be aware deduplication exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context here and the discussion offline. I've made the atomic now stored on the logger registry DeduplicateLogs (in the affirmative.) The value stored on config.Config is still DisableLogDeduplication, so the "shuffling" happens in a slightly different place than where you pointed out. I wanted to keep config.Config and config.configData 1:1 with the names of fields in JSON/proto (besides snake_case -> camelCase and vice-versa conversions.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did totally forget to add to [Un]MarshalJSON methods. Added that logic in and a test (and a second test for PackagePath whose lack of JSON-marshal logic bit me the other day.) I filed RSDK-9717 to add more testing around JSON roundtripping robot configs and left a TODO; we don't have much now.

@@ -304,6 +315,13 @@ func TestSublogger(t *testing.T) {
}

func TestLoggingWithFields(t *testing.T) {
// Disable log deduplication for this test, as it logs "noisily" and makes
// assertions on those logs.
DisableLogDeduplication.Store(true)
Copy link
Member

@dgottlieb dgottlieb Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think* this global variable could instead, without loss of generality, be put on the logger registry object? "Registry" certainly doesn't sound like the right name, but I think it's global enough for our purposes. And it would isolate the variable for tests like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with Dan here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can throw it on the registry.

Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, had some similar thoughts to Dan

@@ -304,6 +315,13 @@ func TestSublogger(t *testing.T) {
}

func TestLoggingWithFields(t *testing.T) {
// Disable log deduplication for this test, as it logs "noisily" and makes
// assertions on those logs.
DisableLogDeduplication.Store(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with Dan here


// DisableLogDeduplication controls whether to disable de-duplicating noisy
// logs; defaults to false and can be specified in robot config.
DisableLogDeduplication = atomic.Bool{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that it's least surprising for dedupe to be off, but also can see it being better that the names are consistent. but no strong feelings on this from my side

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Jan 7, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test This pull request is marked safe to test from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants